Your SlideShare is downloading. ×
  • Like
Lessons in Cross-Repository Interoperability learned from the aDORe effort
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Lessons in Cross-Repository Interoperability learned from the aDORe effort

  • 2,161 views
Published

 

Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
2,161
On SlideShare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
6
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Lessons in Cross-Repository Interoperability learned from the aDORe effort Herbert Van de Sompel & Jeroen Bekaert Research Library Los Alamos National Laboratory, USA Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 2. The repository model quot;Pattern Recognition: The 2003 OCLC Environmental Scanquot; http://www.oclc.org/membership/escan/toc.htm Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 3. Value chains starting in repositories • New knowledge is really being created when allowing for non- anticipated use of stuff. • These repositories are not about creating services for local users (only) • These repositories are not about creating a service (user interface) for all users • These repositories are about facilitating the use of materials in many contexts • These repositories are the starting point of value chains Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 4. • Value chains emerging from RSS feeds http://www.technorati.com Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 5. Value chains starting in repositories recombine add value Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 6. • Scholarly Communication value chains http://dx.doi.org/10.1045/september2004-vandesompel Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 7. Value chains starting in repositories To allow value chains to emerge on the basis of materials in repositories, those repositories need clear/clean machine interfaces that allow downstream applications to consume materials, aggregate them, build services, … ⇒ Disconnection of repository content and service: allows for creation of both local and remote services ⇒ On-Web: Protocol-oriented interfaces ⇒ These value chains are about the real stuff not (only) about metadata Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 8. Credits The reported material is based on the following work: o The LANL aDORe repository effort o The upcoming PhD thesis by Jeroen Bekaert (Advisor Herbert Van de Sompel) regarding protocol-based interfaces for Open Archival Information Systems (OAIS) o The NSF-funded Pathways project in collaboration with the Information Science group at Cornell University (Carl Lagoze, Sandy Payette, Simeon Warner) Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 9. Outline aDORe A few words about the aDORe architecture A Federation of Repositories A new level of cross-repository interoperability Pathways InterDisseminator A context-sensitive service overlay for a federation of repositories Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 10. aDORe effort aDORe is 2 things: o Standards-based, modular repository architecture - Distributed architecture - Protocol-based interactions between modules - Applicable to create interoperable federations of heterogeneous repositories o Actual implementation of the architecture at LANL for local storage of digital assets (currently in its 2nd version) aDORe is not a product o Components of aDORe software, usable in other environments, will be released Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 11. aDORe effort Standards Distributed Insights architecture in Cross-Repository Protocol-based Interoperability communication Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 12. aDORe effort • Standards used in aDORe include: o XML, o XML Schema, o MPEG-21 Digital Item Declaration, o MPEG-21 Digital Item Identification, o W3C XML Signatures, o OAI-PMH, o NISO OpenURL Framework for Context-Sensitive Services, o Internet Archive ARC file format, o OAIS concepts Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 13. Compound objects Identifier Locator Repository Registry
  • 14. OAI-PMH Federator Dynamic Dissemination Engine OpenURL Resolver
  • 15. OAI-PMH Federator & OpenURL Resolver OAIS # items aDORe Interface identifier Access in front-end standard Type response OAI-PMH OAI-PMH Package Identifier OAIS DIP 1 or more Federator Content Identifier, Package OpenURL NISO OAIS DIP & Identifier (with XML ID 1 Resolver OpenURL Result Set fragment) Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 16. http://cordra.net Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 17. Outline aDORe A few words about the aDORe architecture A Federation of Repositories A new level of cross-repository interoperability Pathways InterDisseminator A context-sensitive service overlay for a federation of repositories Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 18. The interoperable repository model I will try to show that: • a significantly higher level of cross-repository interoperability can be achieved with relatively modest means • those means are largely available and agreed upon in our community I will introduce: • Repository-level requirements • Infrastructure-level requirements Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 19. Part 1 : Requirements for a repository in a federation Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 20. Repositories & Units of Communication • Data-oriented research => not only textual materials, but also datasets, software, simulations, dynamic knowledge presentations, … • Research results represented by variety of digital media ⇒ these media must receive status similar to that of text in current system • Materials in various stages of certification: ⇒ units of communication not only ‘papers’ but also preprints, raw datasets, prototype simulations, … • Facilitate collaboration ⇒ re-use of units of communications Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 21. Repositories & Units of Communication • Handling this requires: o a compound object view of a unit of communication o stop thinking in terms of metadata versus content • Compound object: o Has a persistent identifier o Contain materials and metadata about those materials o Can contain other compound objects Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 22. Compound objects URI_7 URIs: URI_3 • minted by different repositories • from different namespaces • not (necessarily) locators URI_9 compound object Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 23. XML-based representation of compound objects URI_7 URI_7 MPEG-21 DIDL URI_3 METS URI_3 IMS/CP RDF URI_9 URI_9 compound object XML-based representation Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 24. Repository Interop Interface 1: OAI-PMH & CO URI_7 URI_3 URI_9 repository_a OAI-PMH OAI-PMH baseURL_m harvester • machine consumption • batches of compound objects • OAI-PMH datestamp ~ new version of object Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 25. OAI-PMH interface to OAIS (Jeroen Bekaert) agent baseURL(OAIPMH_CIID)? verb=ListMetadataFormats list of DIP formats ( ListMetadataFormats response) baseURL(OAIPMH_CIID)? verb=ListRecords& metadataPrefix= info:pathways/svc/dip.rdf list of DIPs (derived from most recent AIPs) Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 26. Repository Interop Interface 1: OAI-PMH & CO URI_12 add value URI_7 recombine URI_7 URI_3 URI_3 URI_9 URI_9 OAI-PMH repository_b OAI-PMH harvester baseURL_n • include provenance ~ version of compound object Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 27. Repository Interop Interface 2: OpenURL & CO URI_7 URI_3 OpenURL URI_9 baseURL_x? url_ver=Z39.88-2004 & rft_id=URI_7 & svc_id=info:pathways/svc/dip.* repository_n OpenURL baseURL_o • machine (& human) consumption • single object dissemination ~ identifier of compound object Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 28. Repository Interop Interface 2: OpenURL & CO • ServiceType = Request a representation of the DO expressed using a compound object format o Example: - svc_id = info:pathways/svc/dip.didl (request MPEG-21 DIDL representation) - svc_id = info:pathways/svc/dip.mets (request METS representation) - svc_id = info:pathways/svc/dip.rdf (request RDF representation – see later) • Other Entities could be added to Interface #2 (think Requester) Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 29. Repository Interop Interface 2: OpenURL & CO • independent of nature of identifiers • ‘resolution’ independent of scheme- OpenURL specific mechanisms • conceptual interface is persistent over time • KEV & HTTP (REST) • XML & SOAP •… repository_n OpenURL baseURL_o Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 30. OpenURL interface to OAIS (Jeroen Bekaert) agent BaseURL(OpenURL_CIID)? OpenURL url_ver=Z39.88-2004& rft_id=ContentInfoIdentifier& svc_id= info:pathways/svc/dip list of ContextObjects BaseURL(OpenURL_CIID)? for each AIP (version) url_ver=Z39.88-2004& BaseURL(OpenURL_CIID)? rft_id=ContentInfoIdentifier& url_ver=Z39.88-2004& rft_val_fmt= info:ofi/fmt:kev:mtx:pathways & rft_id=ContentInfoIdentifier& rft.aip= AIPIdentifier & rft_val_fmt= info:ofi/fmt:kev:mtx:pathways & svc_id=info:pathways/svc/dip rft.aip= AIPIdentifier & svc_id=info:pathways/svc/dip list of ContextObjects BaseURL(OpenURL_CIID)? for each DIP format url_ver=Z39.88-2004& BaseURL(OpenURL_CIID)? rft_id=ContentInfoIdentifier& url_ver=Z39.88-2004& rft_val_fmt=info:ofi/fmt:kev:mtx:pathways& rft_id=ContentInfoIdentifier& rft.aip=AIPIdentifier& rft_val_fmt=info:ofi/fmt:kev:mtx:pathways& svc_id= info:pathways/svc/dip.rdf rft.aip=AIPIdentifier& svc_id= info:pathways/svc/dip.* DIP (RDF) Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 31. Part 2 :Requirements for an infrastructure supporting a federation of repositories Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 32. Repository Registry: Who is part of the Federation? register Repository Registry Per Repository: • Repository identifier • baseURL of OAI-PMH interface • baseURL of OpenURL interface • whichever kind of information that helps downstream applications understand about the nature of the repository Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 33. Object Registry: What is part of the Federation? Per compound object: • Object identifier • Object datetime ~ OAI-PMH datestamp • OAI-PMH identifier • Repository identifier of the object itself, and of its contained objects SRU SRW Object Registry handle harvest (identifiers) Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 34. OAI-PMH & OpenURL access to objects in federation Repository URI_7 Registry URI_3 URI_9 • List of existing copies • Per copy: SRU • OAI-PMH access info SRW Object • OpenURL access info Registry handle URI_7 Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 35. Part 3 : Summary of requirements Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 36. Summary of requirements Requirement Repository Infrastructure Compound Object X model support XML-based X ? representations support OAI-PMH CO support X OpenURL CO support X Repository Registry X Object Registry X Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 37. Summary of requirements Many variations on the design possible, yet most of this can be achieved with: • Off-the-shelf tools o OAI-PMH tools o Handle system, SRU/W tools o OpenURL tools o Tools to generate XML-based representations of objects • Surprisingly little effort • A feasible amount of coordination/specification • Some shared infrastructure Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 38. The Good News ™ • Microsoft & Mellon Foundation interested in taking interoperability across repositories to a new level • Meeting is being planned to consult with the major stakeholders on identifying, specifying, and implementing concrete ways forward • Tony Hey has (t)asked me to call that meeting Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 39. Outline aDORe A few words about the aDORe architecture A Federation of Repositories A new level of cross-repository interoperability Pathways InterDisseminator A context-sensitive service overlay for a federation of repositories Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 40. Pathways InterDisseminator Service Overlay • Pathways InterDisseminator: Dynamic Service-Oriented Overlay upon the federated architecture • Assumes the existence of: • OpenURL Interface to all repositories in the federation • Object Registry (given an identifier, at which OpenURL interface is the object available?) • Availability of an RDF-based representation of DO compliant with a Pathways OWL core ontology • Is itself exposed as a different OpenURL Resolver Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 41. Pathways InterDisseminator : core ontology hasEntity Location ion cat ha sLo Entity hasRepresentation Represen- tation has For ma t Format Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 42. baseURL_y? url_ver=Z39.88-2004 & baseURL_y? rft_id=URI_7 & url_ver=Z39.88-2004 & svc_id=info:pathways/boostrap rft_id=URI_7 & svc_id=info:pathways/dip.rdf magic DSpace engine URI_7 URI_3 URI_9 Fedora OpenURL RDF ContextObject Container aDORe Interop Service Overlay Interface 2 OpenURL OpenURL Application Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 43. Pathways InterDisseminator Service Overlay • Part of the dissemination OpenURL Application is an engine that dynamically decides upon services for a given object from a repository (in a federation). o It grabs the (RDF) representation of the DO from its origin repository o It introspects on the properties expressed in that (RDF) representation o It compares these properties with its knowledge database o It returns a list of possible services/disseminations • There can be many of these engines in a federation. The result is the ability to provide context-sensitive disseminations of DOs in (a federation of) repositories. Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 44. Pathways InterDisseminator Service Overlay • There can be many of these engines in a federation. The result is the ability to provide context-sensitive disseminations of DOs in (a federation of) repositories. Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 45. baseURL_y? url_ver=Z39.88-2004 & baseURL_y? rft_id=URI_7 & url_ver=Z39.88-2004 & svc_id=info:magic/justdoit rft_id=URI_7 & svc_id=info:pathways/dip.rdf DSpace URI_7 URI_3 URI_9 Fedora service RDF execution engine web service aDORe Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 46. Pathways InterDisseminator Demo aDORe Digital Object in Demo Type MIME identifier Digital Object scholarly paper N/A DOI aDORe application/xml datastream Constituent Datastream 1 metadata record (MARCXML) id (info URI) aDORe application/xml datastream Constituent Datastream 2 metadata record (original metadata) id (info URI) aDORe datastream Constituent Datastream 3 fulltext file application/pdf id (info URI) Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 47. Demo • Install TSCC coded (http://www.techsmith.com) • Launch movie Pathways_InterDisseminator.avi in same path as this presentation Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY
  • 48. Comments, Flames, Questions Herbert Van de Sompel & Jeroen Bekaert Research Library, Los Alamos National Laboratory RESEARCH CNI Task Force Meeting, December 5-6 2005, Phoenix, Arizona LIBRARY