The Open Archives Initiative

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    The Open Archives Initiative - Presentation Transcript

    1. The Open Archives Initiative Michael L. Nelson Computer Science, Old Dominion University www.cs.odu.edu/~mln/ www.openarchives.org The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    2. Open Archives Initiative Protocol for Metadata Harvesting • data providers / repositories: o “A repository is a network accessible server that can process the 6 OAI-PMH requests in the manner described in [the OAI-PMH document].   A repository is managed by a data provider to expose metadata to harvesters.”  • service providers / harvesters: o “A harvester is a client application that issues OAI-PMH requests.  A harvester is operated by a service provider as a means of collecting metadata from repositories.” The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    3. Data Providers / Service Providers data providers service providers (repositories) (harvesters) The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    4. Overview of OAI-PMH Verbs Verb Function Identify description of repository repository metadata ListMetadataFormats metadata formats supported by repo ListSets sets defined by repository ListIdentifiers OAI unique ids contained in repo harvesting verbs ListRecords listing of N records GetRecord listing of a single record most verbs take arguments: dates, sets, ids, metadata formats and resumption token (for flow control) The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    5. OAI-PMH data model resource OAI-PMH sets OAI-PMH identifier entry point to all records pertaining to the resource item OAI-PMH identifier Dublin Core MARCXML metadataPrefix metadata records metadata datestamp metadata pertaining to the resource The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    6. Complexity Comes to OAI-PMH… • First noticed in how people would populate their Dublin Core records o people need the HTML splash page o crawlers need the PDF file • Ad-hoc conventions and methods used to expose the repository’s knowledge about the structure of the object • Next three slides taken from “Resource Harvesting Within the OAI-PMH Framework” o http://www.dlib.org/dlib/december04/vandesompel/12vandesompel.html The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    7. Dublin Core Encoding Type 1 <oai_dc:dc> <dc:title>A Simple Parallel-Plate Resonator Technique for Microwave. Characterization of Thin Resistive Films</dc:title> <dc:creator>Vorobiev, A.</dc:creator> <dc:subject>ING-INF/01 Elettronica</dc:subject> <dc:description>A parallel-plate resonator method is proposed for non-destructive characterisation of resistive films used in microwave integrated circuits. A slot made in one ... </dc:description> <dc:publisher>Microwave engineering Europe</dc:publisher> <dc:date>2002</dc:date> <dc:type>Documento relativo ad una Conferenza o altro Evento</dc:type> <dc:type>PeerReviewed</dc:type> <dc:identifier>http://amsacta.cib.unibo.it/archive/00000014/</dc:identifier> <dc:format>pdf http://amsacta.cib.unibo.it/archive/00000014/01/GaAs_1_Vorobiev.pdf </dc:format> </oai_dc:dc> splash page locator of resource The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    8. Dublin Core Encoding Type 2 … <dc:identifier>http://amsacta.cib.unibo.it/archive/00000014/</dc:identifier> <dc:relation> http://amsacta.cib.unibo.it/archive/00000014/01/GaAs_1_Vorobiev.pdf </dc:relation> … splash page locator of resource The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    9. Dublin Core Encoding Type 3 … <dc:identifier> http://amsacta.cib.unibo.it/archive/00000014/</dc:identifier> <dc:relation> http://resolver.unibo.it/00000014/ </dc:relation> <dc:relation> http://amsacta.cib.unibo.it/archive/00000014/01/GaAs_1_Vorobiev.pdf </dc:relation> … splash page locator of resource splash page The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    10. OAI Object Re-Use and Exchange • Develop, identify, and profile extensible standards and protocols to allow repositories, agents, and services to interoperate in the context of use and reuse of compound digital objects beyond the boundaries of the holding repositories. • Aim for more effective and consistent ways: o to facilitate discovery of these objects, o to reference (link to) these objects (and parts thereof), o to obtain a variety of disseminations of these objects, o to aggregate and disaggregate these objects, o Enable processing by automated agents The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    11. The Structure of Compound Objects is Obfuscated When Mapped to the Web The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    12. Useful for humans and useful for applications is often different HTTP LINK HEADER The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    13. Through the Resource Map, the Web application sees the compound object The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    14. This approach reveals compound objects in the Web graph The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    15. OAI: Its Not Just for Metadata Harvesting Anymore… OAI-PMH OAI-ORE Repository structure Object structure Metadata centric Resource centric Metadata harvesting Object re-use (obtain, harvest, register) OAI-PMH and OAI-ORE are complimentary; o you can do one without the other o you can do them together The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    16. OAI-ORE : Current Status • Ongoing definition of the ORE framework o Reach joint problem statement o Issues regarding identification o Model for ORE resource o Publishing ORE resources to the Web o Discovering ORE resources • Review of appropriate technologies for ORE Model and Resource Map o ATOM o DID/DIDL, IMS/CP, METS, Ramlet o RDF, RDF/XML o Dublin Core Abstract Model o … The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    17. OAI-ORE : Current Status • Explore demonstrators using these concepts in preparation of May 2007 ORE Technical Committee meeting • Post May 2007 meeting: o Hopefully work towards alpha specs for ORE resource, Resource Map, discovery of ORE resource o Experimentation with alpha specs The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    18. My research group’s approach to OAI/Preservation integration… The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    19. Preservation: Fortress Model Five Easy Steps for Preservation: 1. Get a lot of $ 2. Buy a lot of disks, machines, tapes, etc. 3. Hire an army of staff 4. Load a small amount of data 5. “Look upon my archive ye Mighty, and despair!” image from: http://www.itunisie.com/tourisme/excursion/tabarka/images/fort.jpg The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    20. Alternate Models of Preservation • Lazy Preservation o Let Google, IA et al. preserve your website • Just-In-Time Preservation o Wait for it to disappear first, then a “good enough” version • Shared Infrastructure Preservation o Push your content to sites that might preserve it • Web Server Enhanced Preservation o Use Apache modules to create archival-ready resources image from: http://www.proex.ufes.br/arsm/knots_interlaced.htm The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    21. Web Site Preservation: 2 Problems Guess the bean count, win the jar The counting problem The representation problem How many pages are on that site? What’s that page all about? To save it you have to find it Future use requires understanding The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    22. OAI-PMH Data Model resource OAI-PMH identifier item = entry point to all records pertaining to the resource metadata pertaining Dublin Core MPEG-21 MARCXML records to the resource metadata METS DIDL metadata modeled representation simple complex complex more expressive of the resource model model model model The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    23. mod_oai implementation Integrate OAI-PMH functionality into the web server itself… 1. Use mod_oai - an Apache 2.0 module - automatically answers OAI-PMH requests for an http server - written in C - respects values in .htaccess, httpd.conf 2. Install mod_oai on http://www.foo.edu/ Define baseURL: http://www.foo.edu/modoai 3. Result: web harvesting with OAI-PMH semantics (e.g., from, until, sets) http://www.foo.edu/modoai?verb=ListRecords&metdataPrefix=oai_didl&from=2004-09-15&set=mime:video:mpeg From site foo, dating from 9/15/2004 through today Give me all resources Using OAI-PMH And their preservation metadata that are MIME type video-MPEG The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    24. Addressing the Counting Problem: ListIdentifiers CRAWLER: • issues a ListIdentifiers, • finds URLs of updated resources • does HTTP GET updates only • can get URLs of resources with specified MIME types The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    25. Addressing the Representation Problem: ListRecords in DIDL Format CRAWLER: • Makes a ListRecords query, • Gets updates as MPEG-21 DIDL records (HTTP headers, resource By Value or By Reference) • can get resources with specified MIME types The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    26. CRATE: Preservation Metadata at Dissemination Time • Harnesses web server to support preservation • Moves preservation metadata from “strict validation at ingest” to “best-effort Plug-in Name Executable path description at dissemination” The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson
    27. Validation is Subjective Preservation metadata is like a David Hockney photo collage: each image is both true and incomplete, and while the result is not faithful, it does capture the “essence” images from: http://facweb.cs.depaul.edu/sgrais/collage.htm The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson

    + Michael NelsonMichael Nelson, 4 months ago

    custom

    201 views, 1 favs, 0 embeds more stats

    Presented at: Digital data preservation, sharing, a more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 201
      • 201 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 3
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories