The aDORe Federation Architecture

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    The aDORe Federation Architecture - Presentation Transcript

    1. The aDORe Federation Architecture Herbert Van de Sompel (1), Ryan Chute (2), Luydimilla Balakireva (3) Digital Library Research & Prototyping Team Research Library Los Alamos National Laboratory (1)herbertv@lanl.gov http://public.lanl.gov/herbertv/ (2) rchute@lanl.gov (3) ludab@lanl.gov Acknowledgments: Jeroen Bekaert, Patrick Hochstenbach, Henry Jerez, Xiaoming Liu The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    2. The Fedora Adoration Architecture? (the mandatory joke at the start of a presentation) The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    3. Presentation Based on the Paper Herbert Van de Sompel, Ryan Chute, Patrick Hochstenbach. The aDORe Federation: Digital Repositories at Scale. 40 pages. International Journal on Digital Libraries. Special Issue on Very Large Digital Libraries. In Publication, 2008. Preprint available at http://arxiv.org/abs/0803.4511 The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    4. The aDORe Project: Background • Initial motivation: o Severe deficiencies in the new information discovery environment developed for the LANL Research Library: - Metadata-centric: descriptive metadata records first class citizens; actual digital assets auxiliary data. - Tens of millions of digital assets stored as files in file system. - Tight integration between content collection and discovery application, preventing other applications from leveraging the rich content base. o Obvious solution: - Replace metadata-centric approach by compound object approach. - Bundle digital assets into storage containers that dramatically reduce the amount of files in a file system. - Cleanly separate storage repository from applications that leverage the stored assets by providing necessary machine interfaces. • Implementation of the obvious solution led to the aDORe R&D project 2003-2007 The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    5. The aDORe Project: Major Drivers • Concrete need to design and implement a solution to ingest, store, access the vast and growing collection of the LANL Research Library. o Scale, scale, scale! o Existing open source solutions (at that time) did not meet our scale requirements - e.g. static binding of disseminators to objects in Fedora. • Interest in repository interoperability, cf. involvement in OAI- PMH, NISO OpenURL, OAI-ORE • Interest in digital preservation, cf. NDIIP funding The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    6. The aDORe Project: Major Design Constraints • Leverage existing standards and technologies to make development and migration more straightforward. o Read: Laziness as a strategy • Use a distributed, component based approach to meet challenges of scale. The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    7. The aDORe Project: Some of the Results • Conceptual: o aDORe Federation architecture: A high-level, 3-Tier architecture for the federation of distributed repositories. • Concrete: o The aDORe Archive storage solution (XMLtapes/ARCfiles) - Tier-1 of the aDORe Federation architecture. o The aDORe Federation software - an implementation of the 3- Tier architecture, with the aDORe Archive in Tier-1. • And a lot more, but that’s not for today. The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    8. The aDORe Federation software • Available today at: http://african.lanl.gov/aDORe/aDOReFederation • Thanks to Ryan Chute & Luydimilla Balakireva • This is a major update to the aDORe Archive: o Updates the Tier-1 aDORe Archive o Implements the 3 Tiers of the architecture instead of only Tier-1 • In production at LANL Research Library for over 1 year • Attractive for large collections of relatively stable objects • Could be used as a plug-in storage component for IR solutions The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    9. The aDORe Archive @ LANL, March 31 2008 • 87,000,000 Compound Digital Objects • 208,000,000 Stored bitstreams • ~ 9,200 autonomous repositories: o ~ 4,000 XMLtapes: XML-based serializations of Digital Objects o ~ 5,200 ARCfiles: bitstreams • > 500,000,000 identifiers • More about the aDORe Federation software later The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    10. The aDORe Federation Architecture: Goal • Facilitate a uniform manner for client applications to discover and access content objects available in a group of distributed repositories. • Single repository behavior for a group of distributed repositories. • Note that these distributed repositories can very well be “hidden” and that only the federated result is made “public”. o Cf. Peter Murray-Rust departmental repositories • Not about uniform approaches to add, update, delete objects in repositories. o Considered the responsibility of individual repositories. o However, changes are made apparent to the federation. The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    11. aDORe Federation Architecture The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    12. Basic Design Choices • All entities in the architecture are identified by means of URIs: content objects, repositories, machine interfaces. o Turns entities into uniquely identified Web resources o Avoids unwanted identifier collisions, for example, for different content objects from various repositories. • Depending on use case, certain Content Objects are identified by either: o Protocol-based URIs o Non-protocol-based URIs • All machine-interfaces are (HTTP) protocol-based The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    13. Content Objects • 3 types of Content Objects: o Digital Objects, o Datastreams, o Surrogates • These types do not need to be natively embraced by repositories; they are supported at federation-facing machine interfaces. They are abstractions. • Other properties of Content Objects may be expressed, but the architecture only serves to convey them. • Core enabling properties of the architecture: o identification, o location, o time-stamp of these Content Objects The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    14. Content Objects: Digital Object • Cf. Kahn-Wilensky, cf. most repository systems • A Digital Object is an identified aggregation of one or more Datastreams and properties pertaining to the Datastreams and to the aggregation. • A Digital Object is the perspective of a repository’s native compound object that is shared with the federation. The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    15. Content Objects: Digital Object • Identification: DO-URI o Inherited from other environment and/or minted by repository. - Cf Internet Archive HTTP URIs of stored objects; identifiers assigned to scanned images o One or more per Digital Object o Digital Objects with same DO-URI may exist in multiple repositories - Cf paper with same DOI in multiple IRs; HTTP URI in Internet Archive o Protocol-based, non-protocol-based - http://some.repo.org/do/1234 - info:some-repo/do/1234 o Always treated as non-protocol-based, i.e. never resolved (in the federation) using its native resolution protocol, but rather conveyed as parameter in request against the federation’s machine interfaces. - Cf Internet Archive. • Time-stamping: o Digital Object change over time o Changes communicated to federation via Surrogates The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    16. Content Objects: Surrogates • A Surrogate is the serialization of a Digital Object into a machine- readable representation that is made accessible by a repository. e.g. DIDL, METS, ORE Atom or RDF/XML. • Surrogates are the vehicles repositories use to keep the federation informed about the availability of their Digital Objects and about changes those Digital Objects undergo. • One or more Surrogates can correspond with a Digital Object in a federation: o Digital Object with same URI may exist in mutliple repositories o Single repository may have multiple Surrogates for a Digital Object • Minimally expresses: o DO-URI of the DO it serializes, o Datastream-URI/URLs of constituent Datastreams, o Identifier of the Surrogate itself. The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    17. Content Objects: Surrogates • Identification: Identifier minted by the repository that makes the Surrogate available. o Protocol-based: Surrogate-URL (native resolution) o Non-protocol-based: Surrogate-URI (resolution via interfaces) • Time-stamping: Surrogate-datetime changes when a change to a Digital Object needs to be communicated to the federation. Minimally when constituency changes. • Update Policies: o New Surrogate Policy: - Change to Digital Object => New Surrogate, new Surrogate- URI/URL, new Surrogate-datetime - Previous Surrogate remains available o Update Surrogate Policy - Change to Digital Object => Update Surrogate, same Surrogate- URI/URL, new Surrogate-datetime - Previous Surrogate no longer available The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    18. Content Objects: Datastreams • A Datastream is a retrievable bitstream of whichever media type made available by a repository to the federation. • It is a perspective of a repository’s native bitstream that is shared with an aDORe federation. • Can be, e.g.: o Dissemination of locally stored bitstream, o Dissemination of externally stored bitstream, o Result of applying a service to a (local or external) bitstream. • A Datastream can be part of multiple Digital Objects, but there is one repository in the federation that owns/serves it. The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    19. Content Objects: Datastreams • Identification: Identifier minted by the repository that makes the Datastream available. o Protocol-based: Datastream-URL (native resolution) o Non-protocol-based: Datastream-URI (resolution via interfaces) • Time-stamping: Datastream-datetime changes when a change to a Datastream needs to be communicated to the federation. • Update Policies: o New Datastream Policy: - Update of retrievable bitstream => new Datastream, new Datastream-URI/URL, new Datastream-datetime - Cf. digital preservation: migration of a JPEG image (URI-1) leads to JPEG-2000 (URI-2), original JPEG maintained. o Update Datastream Policy - Update of retrievable bitstream => same Datastream, same Datastream-URI/URL, new Datastream-datetime - Original bitstream no longer available The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    20. aDORe Federation Architecture: Tier-1 The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    21. Tier-1: Surrogate and (sometimes) Datastream Repositories • Surrogate Repositories, Datastream Repositories as well as their Interfaces identified by URI • Interfaces leverage identification, time-stamping of Content Objects • Datastream Repository only when using (non-protocol-based) Datastream-URIs The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    22. aDORe Federation Architecture: Tier-1 The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    23. aDORe Federation Architecture: Tier-1 The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    24. Tier-1: Locate Surrogates • Use: Repositories that have multiple Surrogates for a given Digital Object, or that have Digital Object that share Datastreams. • http://some.repo.org/openurl?url_ver=z39.88-2004&rft_id=http:// some.repo.org/ds/5678&svc_id=info:ourfederation/svc/LocateSurrogates. The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    25. Tier-1: Obtain Datastream • Use: For repositories that use Datastream-URIs, not Datastream-URLs • http://some.repo.org/openurl?url_ver=z39.88-2004&rft_id=info:some- repo/ds/5678&svc_id=info:ourfederation/svc/ObtainDatastream. The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    26. aDORe Federation Architecture: Tier-2 The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    27. Tier-2: Service Registry • Keeps track of all components in the federation. In essence 2 look-up tables. • Look-up table 1: o URI of component (e.g. Repository-URI) o Matching Interface-URIs (and Interface type) • Table 2 o Interface-URI o Interface-URL • Exposes minimally 1 interface, Obtain Registry Record • Cf Information Environment Service Registry The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    28. Tier-2: Identifier Locator • Look-up table: o Identifiers of Content Objects in the federation o Identifiers of Datastream or Surrogate Repositories that make these Content Objects accessible o Necessarily will store this information for non-protocol-based identifiers - Minimally DO-URI (remember: treated as non-protocol-based) • Populated by recurrently interacting with Harvest Surrogates and Harvest Datastream Identifier interfaces of all Tier-1 repositories. • Identifier Locator knows about these interfaces via the Service Registry. • Exposes minimally 1 interface, Locate Repositories: o In: DO-URI, Surrogate-URI, Datastream-URI o Out: List of Repository-URIs The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    29. aDORe Federation Architecture: Tier-3 The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    30. aDORe Federation software The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    31. The aDORe Archive, new release • An updated aDORe Archive Installer containing: o XMLtape Toolkit (updated) o ARCfile Toolkit (updated) o XMLtape Registry (updated) o ARCfile Registry (updated) o XMLtape OpenURL Resolver (new) - Provides Core Surrogate Services (e.g. Obtain, Locate, Harvest Identifiers) through a simple and efficient OpenURL Service Interface. o XMLtape OpenURL XQuery Resolver (new) - Provides a configuration-based solution for complex ad-hoc queries. Built upon Nux <http://dsd.lbl.gov/nux/>, an open-source Java toolkit that provides a scalable solution for non-indexed based search of large XML repositories The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    32. The aDORe Archive, new release • A new aDORe Federation Installer providing: o An aDORe Archive installation o IESR-based Service Registry (new) - Uses the Ockham Service Registry IESR-based database schema and provides OAI-PMH and OpenURL Services. o Identifier Locator (new) - A fast, in-memory MySQL-based solution used for efficient resolution of Digital Object, Datastream, and Surrogate URIs to Repository URIs. o OAI-PMH Federator (new) - Provides access to multiple aDORe Archive installations through a common OAI-PMH interface. o OpenURL Disseminator (new) - OpenURL Service interface providing federated access to all repository content, as well as performs dynamic dissemination services using a rule-engine based plug-in framework. The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008
    33. The aDORe Archive, new release • The documentation will provide: o Detailed presentations of available service interfaces and available pluggable interfaces. o A sample ingestion of DIDL content to illustrate the various key service interfaces available in the aDORe Federation o A tutorial showing how to create processing implementations and the necessary configurations o A public demo version of the aDORe Federation using public domain content (coming soon) The aDORe Federation Architecture Herbert Van de Sompel, Ryan Chute, Luydimilla Balakireva Open Repositories 2008, University of Southampton, UK, April 1-4 2008

    + Herbert Van de SompelHerbert Van de Sompel, 5 months ago

    custom

    211 views, 0 favs, 1 embeds more stats

    More info about this document

    CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

    Go to text version

    • Total Views 211
      • 209 on SlideShare
      • 2 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 2
    Most viewed embeds
    • 2 views on http://public.lanl.gov

    more

    All embeds
    • 2 views on http://public.lanl.gov

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories