The Open Archives Initiative
Upcoming SlideShare
Loading in...5
×
 

The Open Archives Initiative

on

  • 2,046 views

Presented at: Digital data preservation, sharing, and discovery: Challenges for Small Science Communities in the Digital Era, Durham NC (5/16/07)

Presented at: Digital data preservation, sharing, and discovery: Challenges for Small Science Communities in the Digital Era, Durham NC (5/16/07)

Statistics

Views

Total Views
2,046
Views on SlideShare
2,046
Embed Views
0

Actions

Likes
1
Downloads
26
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The Open Archives Initiative The Open Archives Initiative Presentation Transcript

  • The Open Archives Initiative Michael L. Nelson Computer Science, Old Dominion University www.cs.odu.edu/~mln/ www.openarchives.org The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • Open Archives Initiative Protocol for Metadata Harvesting • data providers / repositories: o “A repository is a network accessible server that can process the 6 OAI-PMH requests in the manner described in [the OAI-PMH document].   A repository is managed by a data provider to expose metadata to harvesters.”  • service providers / harvesters: o “A harvester is a client application that issues OAI-PMH requests.  A harvester is operated by a service provider as a means of collecting metadata from repositories.” The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • Data Providers / Service Providers data providers service providers (repositories) (harvesters) The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • Overview of OAI-PMH Verbs Verb Function Identify description of repository repository metadata ListMetadataFormats metadata formats supported by repo ListSets sets defined by repository ListIdentifiers OAI unique ids contained in repo harvesting verbs ListRecords listing of N records GetRecord listing of a single record most verbs take arguments: dates, sets, ids, metadata formats and resumption token (for flow control) The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • OAI-PMH data model resource OAI-PMH sets OAI-PMH identifier entry point to all records pertaining to the resource item OAI-PMH identifier Dublin Core MARCXML metadataPrefix metadata records metadata datestamp metadata pertaining to the resource The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • Complexity Comes to OAI-PMH… • First noticed in how people would populate their Dublin Core records o people need the HTML splash page o crawlers need the PDF file • Ad-hoc conventions and methods used to expose the repository’s knowledge about the structure of the object • Next three slides taken from “Resource Harvesting Within the OAI-PMH Framework” o http://www.dlib.org/dlib/december04/vandesompel/12vandesompel.html The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • Dublin Core Encoding Type 1 <oai_dc:dc> <dc:title>A Simple Parallel-Plate Resonator Technique for Microwave. Characterization of Thin Resistive Films</dc:title> <dc:creator>Vorobiev, A.</dc:creator> <dc:subject>ING-INF/01 Elettronica</dc:subject> <dc:description>A parallel-plate resonator method is proposed for non-destructive characterisation of resistive films used in microwave integrated circuits. A slot made in one ... </dc:description> <dc:publisher>Microwave engineering Europe</dc:publisher> <dc:date>2002</dc:date> <dc:type>Documento relativo ad una Conferenza o altro Evento</dc:type> <dc:type>PeerReviewed</dc:type> <dc:identifier>http://amsacta.cib.unibo.it/archive/00000014/</dc:identifier> <dc:format>pdf http://amsacta.cib.unibo.it/archive/00000014/01/GaAs_1_Vorobiev.pdf </dc:format> </oai_dc:dc> splash page locator of resource The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • Dublin Core Encoding Type 2 … <dc:identifier>http://amsacta.cib.unibo.it/archive/00000014/</dc:identifier> <dc:relation> http://amsacta.cib.unibo.it/archive/00000014/01/GaAs_1_Vorobiev.pdf </dc:relation> … splash page locator of resource The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • Dublin Core Encoding Type 3 … <dc:identifier> http://amsacta.cib.unibo.it/archive/00000014/</dc:identifier> <dc:relation> http://resolver.unibo.it/00000014/ </dc:relation> <dc:relation> http://amsacta.cib.unibo.it/archive/00000014/01/GaAs_1_Vorobiev.pdf </dc:relation> … splash page locator of resource splash page The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • OAI Object Re-Use and Exchange • Develop, identify, and profile extensible standards and protocols to allow repositories, agents, and services to interoperate in the context of use and reuse of compound digital objects beyond the boundaries of the holding repositories. • Aim for more effective and consistent ways: o to facilitate discovery of these objects, o to reference (link to) these objects (and parts thereof), o to obtain a variety of disseminations of these objects, o to aggregate and disaggregate these objects, o Enable processing by automated agents The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • The Structure of Compound Objects is Obfuscated When Mapped to the Web The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • Useful for humans and useful for applications is often different HTTP LINK HEADER The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • Through the Resource Map, the Web application sees the compound object The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • This approach reveals compound objects in the Web graph The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • OAI: Its Not Just for Metadata Harvesting Anymore… OAI-PMH OAI-ORE Repository structure Object structure Metadata centric Resource centric Metadata harvesting Object re-use (obtain, harvest, register) OAI-PMH and OAI-ORE are complimentary; o you can do one without the other o you can do them together The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • OAI-ORE : Current Status • Ongoing definition of the ORE framework o Reach joint problem statement o Issues regarding identification o Model for ORE resource o Publishing ORE resources to the Web o Discovering ORE resources • Review of appropriate technologies for ORE Model and Resource Map o ATOM o DID/DIDL, IMS/CP, METS, Ramlet o RDF, RDF/XML o Dublin Core Abstract Model o … The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • OAI-ORE : Current Status • Explore demonstrators using these concepts in preparation of May 2007 ORE Technical Committee meeting • Post May 2007 meeting: o Hopefully work towards alpha specs for ORE resource, Resource Map, discovery of ORE resource o Experimentation with alpha specs The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • My research group’s approach to OAI/Preservation integration… The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • Preservation: Fortress Model Five Easy Steps for Preservation: 1. Get a lot of $ 2. Buy a lot of disks, machines, tapes, etc. 3. Hire an army of staff 4. Load a small amount of data 5. “Look upon my archive ye Mighty, and despair!” image from: http://www.itunisie.com/tourisme/excursion/tabarka/images/fort.jpg The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • Alternate Models of Preservation • Lazy Preservation o Let Google, IA et al. preserve your website • Just-In-Time Preservation o Wait for it to disappear first, then a “good enough” version • Shared Infrastructure Preservation o Push your content to sites that might preserve it • Web Server Enhanced Preservation o Use Apache modules to create archival-ready resources image from: http://www.proex.ufes.br/arsm/knots_interlaced.htm The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • Web Site Preservation: 2 Problems Guess the bean count, win the jar The counting problem The representation problem How many pages are on that site? What’s that page all about? To save it you have to find it Future use requires understanding The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • OAI-PMH Data Model resource OAI-PMH identifier item = entry point to all records pertaining to the resource metadata pertaining Dublin Core MPEG-21 MARCXML records to the resource metadata METS DIDL metadata modeled representation simple complex complex more expressive of the resource model model model model The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • mod_oai implementation Integrate OAI-PMH functionality into the web server itself… 1. Use mod_oai - an Apache 2.0 module - automatically answers OAI-PMH requests for an http server - written in C - respects values in .htaccess, httpd.conf 2. Install mod_oai on http://www.foo.edu/ Define baseURL: http://www.foo.edu/modoai 3. Result: web harvesting with OAI-PMH semantics (e.g., from, until, sets) http://www.foo.edu/modoai?verb=ListRecords&metdataPrefix=oai_didl&from=2004-09-15&set=mime:video:mpeg From site foo, dating from 9/15/2004 through today Give me all resources Using OAI-PMH And their preservation metadata that are MIME type video-MPEG The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • Addressing the Counting Problem: ListIdentifiers CRAWLER: • issues a ListIdentifiers, • finds URLs of updated resources • does HTTP GET updates only • can get URLs of resources with specified MIME types The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • Addressing the Representation Problem: ListRecords in DIDL Format CRAWLER: • Makes a ListRecords query, • Gets updates as MPEG-21 DIDL records (HTTP headers, resource By Value or By Reference) • can get resources with specified MIME types The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • CRATE: Preservation Metadata at Dissemination Time • Harnesses web server to support preservation • Moves preservation metadata from “strict validation at ingest” to “best-effort Plug-in Name Executable path description at dissemination” The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  • Validation is Subjective Preservation metadata is like a David Hockney photo collage: each image is both true and incomplete, and while the result is not faithful, it does capture the “essence” images from: http://facweb.cs.depaul.edu/sgrais/collage.htm The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson