Biblio to Fedora Commons REST API


Published on

This is a brief overview of how we'll use glue Biblio and Fedora Commons together for the Biodiversity Heritage Library. This binds together many pieces of the project and touches on how we'll use Fedora Commons as a preservation layer for the corpus of BHL data.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Biblio to Fedora Commons REST API

  1. 1. Biblio to Fedora Commons REST API Chris Moyers March 30, 2009 CiteBank
  2. 2. CiteBank, Biblio, and Fedora Commons <ul><li>CiteBank is using Biblio for presentation layer. </li></ul><ul><ul><li>Biblio and Drupal provide much in the way of content management. </li></ul></ul><ul><li>CiteBank is using Fedora Commons is Preservation Layer. </li></ul><ul><ul><li>In the event of disaster, Fedora will help make sense of massive body of content. </li></ul></ul>
  3. 3. Biblio <ul><li>Stores records, (Bibliographic citations) in MySQL. </li></ul><ul><li>Allows users to attach files to Bibliographic citations. These could be PDFs, images, OCR text, etc. </li></ul><ul><li>Uploads reside in filesystem. Upload metadata is stored in MySQL. </li></ul>
  4. 4. Fedora Commons <ul><li>A Digital Object is a unit of storage in Fedora. A journal article could be a Digital Object, for example. Digital Objects contain Datastreams. </li></ul>Digital Object (stored as FOXML files in filesystem) Datastream 1 Datastream 2 Xml metadata URI Datastream n URI …
  5. 5. Fedora Commons <ul><li>Stores Digital Object metadata on filesystem as FOXML files. </li></ul><ul><li>Allows Externally Referenced datastreams, which simply point to files external to Fedora. </li></ul><ul><li>FOXML provides context for external files referenced within a datastream. </li></ul><ul><li>Has SOAP API (API-M) which allows programmatic ingestion, modification, and purging of digital objects and datastreams. </li></ul>
  6. 6. Distributed Filesystem (DFS) <ul><li>DFS will create redundant data across nodes. </li></ul><ul><li>Files are not stored on the DFS in a way that relates them. Due to Fedora and Drupal’s different ways of handling files. </li></ul><ul><li>Drupal files are in one place, and metadata used to bind them together are in another place (MySQL). FOXML files are in yet another place. </li></ul><ul><li>Consolidate filesystem paths? </li></ul><ul><li>In event of disaster, change of technology, we need to be able to retain and make sense of data. </li></ul>
  7. 7. Why Yet Another REST API? <ul><li>Biblio and Fedora are not integrated. </li></ul><ul><li>Islandora is a module for Drupal that interfaces with Fedora Commons. </li></ul><ul><ul><li>Requires user to explicitly add content to Fedora </li></ul></ul><ul><ul><li>No direct tie-in with Biblio. </li></ul></ul><ul><ul><li>Adds complexity to Fedora Commons. </li></ul></ul><ul><li>We need some way to bridge the gap. </li></ul>
  8. 8. Goals of REST API <ul><li>Get data from Biblio to Fedora in a fashion that is transparent to the end user. </li></ul><ul><li>Digital Objects must be self-describing. </li></ul><ul><li>Keep it as implementation-neutral as possible. </li></ul><ul><li>Provide some basic logging and recovery from failure. </li></ul><ul><li>Make something that can be contributed back to the community. </li></ul><ul><li>Keep it as simple as possible, both to maintain and to implement. </li></ul>
  9. 9. Process Overview
  10. 10. The REST API <ul><li>Biblio provides the node ID for the citation. </li></ul><ul><li>The API then performs several steps: </li></ul><ul><ul><li>Generate Fedora’s PID from the node ID provided by Biblio </li></ul></ul><ul><ul><li>Determine whether item exists in Fedora, uses SOAP API-M, API-A </li></ul></ul><ul><ul><li>Create Bibtex file for the citation. This is essentially a duplicate of the record Biblio creates within MySQL. </li></ul></ul><ul><ul><li>Add/Chg/Del digital object with its datastreams (including the Bibtex file) </li></ul></ul><ul><ul><li>Perform logging </li></ul></ul>
  11. 11. Recovery, Redundancy <ul><li>DFS creates redundant copies of files. </li></ul><ul><li>Biblio records are replicated as Bibtex records in filesystem, and tied to PDFs, images, etc, with a Fedora Digital Object. </li></ul><ul><li>FOXML files from Fedora are stored on DFS. </li></ul><ul><li>Loosely coupled, technology independent, and redundant. </li></ul>
  12. 12. Recovery, Redundancy (cont’d) <ul><li>DFS allows for hardware failure. Uses commodity hardware. </li></ul><ul><li>Data kept in open formats. Not “hostage” to Drupal, Biblio, Fedora. </li></ul><ul><ul><li>FOXML provides context for resources within datastreams </li></ul></ul><ul><ul><li>Bibtex datastreams provide comprehensive metadata for items in CiteBank. </li></ul></ul><ul><ul><li>Images, PDFs, etc, are bound to Bibtex records via FOXML. </li></ul></ul><ul><li>Other key elements of a DR plan, such as off-site backups, also need to be considered. </li></ul>
  13. 13. Future Steps for REST API <ul><li>Contribute code… </li></ul><ul><ul><li>To Islandora? </li></ul></ul><ul><ul><li>To Biblio? </li></ul></ul><ul><ul><li>As a separate Drupal module? </li></ul></ul>