Your SlideShare is downloading. ×
0
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
The Open Archives Initiative
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

The Open Archives Initiative

1,277

Published on

Presented at: Digital data preservation, sharing, and discovery: Challenges for Small Science Communities in the Digital Era, Durham NC (5/16/07)

Presented at: Digital data preservation, sharing, and discovery: Challenges for Small Science Communities in the Digital Era, Durham NC (5/16/07)

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,277
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
27
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. The Open Archives Initiative Michael L. Nelson Computer Science, Old Dominion University www.cs.odu.edu/~mln/ www.openarchives.org The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  2. Open Archives Initiative Protocol for Metadata Harvesting • data providers / repositories: o “A repository is a network accessible server that can process the 6 OAI-PMH requests in the manner described in [the OAI-PMH document].   A repository is managed by a data provider to expose metadata to harvesters.”  • service providers / harvesters: o “A harvester is a client application that issues OAI-PMH requests.  A harvester is operated by a service provider as a means of collecting metadata from repositories.” The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  3. Data Providers / Service Providers data providers service providers (repositories) (harvesters) The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  4. Overview of OAI-PMH Verbs Verb Function Identify description of repository repository metadata ListMetadataFormats metadata formats supported by repo ListSets sets defined by repository ListIdentifiers OAI unique ids contained in repo harvesting verbs ListRecords listing of N records GetRecord listing of a single record most verbs take arguments: dates, sets, ids, metadata formats and resumption token (for flow control) The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  5. OAI-PMH data model resource OAI-PMH sets OAI-PMH identifier entry point to all records pertaining to the resource item OAI-PMH identifier Dublin Core MARCXML metadataPrefix metadata records metadata datestamp metadata pertaining to the resource The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  6. Complexity Comes to OAI-PMH… • First noticed in how people would populate their Dublin Core records o people need the HTML splash page o crawlers need the PDF file • Ad-hoc conventions and methods used to expose the repository’s knowledge about the structure of the object • Next three slides taken from “Resource Harvesting Within the OAI-PMH Framework” o http://www.dlib.org/dlib/december04/vandesompel/12vandesompel.html The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  7. Dublin Core Encoding Type 1 <oai_dc:dc> <dc:title>A Simple Parallel-Plate Resonator Technique for Microwave. Characterization of Thin Resistive Films</dc:title> <dc:creator>Vorobiev, A.</dc:creator> <dc:subject>ING-INF/01 Elettronica</dc:subject> <dc:description>A parallel-plate resonator method is proposed for non-destructive characterisation of resistive films used in microwave integrated circuits. A slot made in one ... </dc:description> <dc:publisher>Microwave engineering Europe</dc:publisher> <dc:date>2002</dc:date> <dc:type>Documento relativo ad una Conferenza o altro Evento</dc:type> <dc:type>PeerReviewed</dc:type> <dc:identifier>http://amsacta.cib.unibo.it/archive/00000014/</dc:identifier> <dc:format>pdf http://amsacta.cib.unibo.it/archive/00000014/01/GaAs_1_Vorobiev.pdf </dc:format> </oai_dc:dc> splash page locator of resource The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  8. Dublin Core Encoding Type 2 … <dc:identifier>http://amsacta.cib.unibo.it/archive/00000014/</dc:identifier> <dc:relation> http://amsacta.cib.unibo.it/archive/00000014/01/GaAs_1_Vorobiev.pdf </dc:relation> … splash page locator of resource The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  9. Dublin Core Encoding Type 3 … <dc:identifier> http://amsacta.cib.unibo.it/archive/00000014/</dc:identifier> <dc:relation> http://resolver.unibo.it/00000014/ </dc:relation> <dc:relation> http://amsacta.cib.unibo.it/archive/00000014/01/GaAs_1_Vorobiev.pdf </dc:relation> … splash page locator of resource splash page The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  10. OAI Object Re-Use and Exchange • Develop, identify, and profile extensible standards and protocols to allow repositories, agents, and services to interoperate in the context of use and reuse of compound digital objects beyond the boundaries of the holding repositories. • Aim for more effective and consistent ways: o to facilitate discovery of these objects, o to reference (link to) these objects (and parts thereof), o to obtain a variety of disseminations of these objects, o to aggregate and disaggregate these objects, o Enable processing by automated agents The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  11. The Structure of Compound Objects is Obfuscated When Mapped to the Web The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  12. Useful for humans and useful for applications is often different HTTP LINK HEADER The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  13. Through the Resource Map, the Web application sees the compound object The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  14. This approach reveals compound objects in the Web graph The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  15. OAI: Its Not Just for Metadata Harvesting Anymore… OAI-PMH OAI-ORE Repository structure Object structure Metadata centric Resource centric Metadata harvesting Object re-use (obtain, harvest, register) OAI-PMH and OAI-ORE are complimentary; o you can do one without the other o you can do them together The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  16. OAI-ORE : Current Status • Ongoing definition of the ORE framework o Reach joint problem statement o Issues regarding identification o Model for ORE resource o Publishing ORE resources to the Web o Discovering ORE resources • Review of appropriate technologies for ORE Model and Resource Map o ATOM o DID/DIDL, IMS/CP, METS, Ramlet o RDF, RDF/XML o Dublin Core Abstract Model o … The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  17. OAI-ORE : Current Status • Explore demonstrators using these concepts in preparation of May 2007 ORE Technical Committee meeting • Post May 2007 meeting: o Hopefully work towards alpha specs for ORE resource, Resource Map, discovery of ORE resource o Experimentation with alpha specs The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  18. My research group’s approach to OAI/Preservation integration… The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  19. Preservation: Fortress Model Five Easy Steps for Preservation: 1. Get a lot of $ 2. Buy a lot of disks, machines, tapes, etc. 3. Hire an army of staff 4. Load a small amount of data 5. “Look upon my archive ye Mighty, and despair!” image from: http://www.itunisie.com/tourisme/excursion/tabarka/images/fort.jpg The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  20. Alternate Models of Preservation • Lazy Preservation o Let Google, IA et al. preserve your website • Just-In-Time Preservation o Wait for it to disappear first, then a “good enough” version • Shared Infrastructure Preservation o Push your content to sites that might preserve it • Web Server Enhanced Preservation o Use Apache modules to create archival-ready resources image from: http://www.proex.ufes.br/arsm/knots_interlaced.htm The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  21. Web Site Preservation: 2 Problems Guess the bean count, win the jar The counting problem The representation problem How many pages are on that site? What’s that page all about? To save it you have to find it Future use requires understanding The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  22. OAI-PMH Data Model resource OAI-PMH identifier item = entry point to all records pertaining to the resource metadata pertaining Dublin Core MPEG-21 MARCXML records to the resource metadata METS DIDL metadata modeled representation simple complex complex more expressive of the resource model model model model The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  23. mod_oai implementation Integrate OAI-PMH functionality into the web server itself… 1. Use mod_oai - an Apache 2.0 module - automatically answers OAI-PMH requests for an http server - written in C - respects values in .htaccess, httpd.conf 2. Install mod_oai on http://www.foo.edu/ Define baseURL: http://www.foo.edu/modoai 3. Result: web harvesting with OAI-PMH semantics (e.g., from, until, sets) http://www.foo.edu/modoai?verb=ListRecords&metdataPrefix=oai_didl&from=2004-09-15&set=mime:video:mpeg From site foo, dating from 9/15/2004 through today Give me all resources Using OAI-PMH And their preservation metadata that are MIME type video-MPEG The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  24. Addressing the Counting Problem: ListIdentifiers CRAWLER: • issues a ListIdentifiers, • finds URLs of updated resources • does HTTP GET updates only • can get URLs of resources with specified MIME types The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  25. Addressing the Representation Problem: ListRecords in DIDL Format CRAWLER: • Makes a ListRecords query, • Gets updates as MPEG-21 DIDL records (HTTP headers, resource By Value or By Reference) • can get resources with specified MIME types The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  26. CRATE: Preservation Metadata at Dissemination Time • Harnesses web server to support preservation • Moves preservation metadata from “strict validation at ingest” to “best-effort Plug-in Name Executable path description at dissemination” The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
  27. Validation is Subjective Preservation metadata is like a David Hockney photo collage: each image is both true and incomplete, and while the result is not faithful, it does capture the “essence” images from: http://facweb.cs.depaul.edu/sgrais/collage.htm The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson

×