Your SlideShare is downloading. ×
Open Archives Initiative Protocol for Metadata Harvesting
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Open Archives Initiative Protocol for Metadata Harvesting

1,542
views

Published on

Dublin Core conference 2009 Seoul, Oct 2009

Dublin Core conference 2009 Seoul, Oct 2009

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,542
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The Open Archives Initiative Protocol for Metadata Harvesting Muriel Foulonneau Tudor Research Centre [email_address] 10/2009 Dublin Core conference 2009, Seoul
  • 2. The protocol was born
    • To create a minimal layer of interoperability between distributed repositories of scientific publications
      • An alternative to federated search
      • Networking of digital repositories
    Oct 2009 [email_address]
  • 3. “ OAI divides the world between data providers and service providers” Oct 2009 [email_address]
  • 4. Sharing metadata : Data aggregation
    • The portal gathers metadata and implements its own retrieval system
    Oct 2009 [email_address] Mill? Eg. Search engines, union catalogs, OAI <title>My resource</title> <date>04
  • 5. The OAI framework Oct 2009 [email_address] Service provider Harvester Data provider Data provider Data provider Agregator
    • Mechanisms to transfer large datasets
      • Resumption tokens
      • Incremental harvesting
    Portal interface Repository Data provider Repository Repository
  • 6. Incremental harvest Harvester Data providers What’s new since the last time I came?
    • New or modified records
    • Deleted records
    [email_address] Oct 2009 <title>My resource</title> <date>04
  • 7. OAI is based on standards
    • HTTP protocol
    • XML and XML Schemas
    • Dublin Core
    Oct 2009 [email_address]
  • 8. Dublin Core MARC21 MODS Multiple representations of an object School of arts for girls Kiz Sanayi Mektebi] oai:lcoa1.loc.gov:loc.pnp/cph.3b23005 [email_address] Oct 2009
  • 9. OAI repositories can be organized in sets Oct 2009 [email_address]
    • Enable selective harvesting
    • Sets can overlap: 1 item in multiple sets
    • Can be described (eg with DC or DC Collection)
  • 10. OAI supports 6 verbs
    • Identify
    • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=Identify
    • ListSets
      • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListSets
    • ListRecords http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListRecords&metadataPrefix=oai_dc
    • ListMetadataFormats
      • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListMetadataFormats
    • ListIdentifiers
      • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListIdentifiers&metadataPrefix=oai_dc
    • GetRecord
      • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=GetRecord&identifier=oai:aerialphotos.grainger.uiuc.edu:AP-1A-1-1940&metadataPrefix=oai_dc
    Oct 2009 [email_address]
  • 11. An OAI response Oct 2009 [email_address] <record> - < header >   <identifier> oai:images.library.uiuc.edu:emblems/324 </identifier>   <datestamp> 2003-10-22 </datestamp>   <setSpec> emblems </setSpec>   </header> - < metadata > - <oai_dc:dc xmlns:oai_dc=&quot; http://www.openarchives.org/OAI/2.0/oai_dc/ &quot; xmlns:dc=&quot; http://purl.org/dc/elements/1.1/ &quot; xmlns:xsi=&quot; http://www.w3.org/2001/XMLSchema-instance &quot; xsi:schemaLocation=&quot; http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd &quot;>   <dc:creator> Müller, Johann Heinrich Traugott, 1631-1675 </dc:creator>     <dc:identifier> http://images.library.uiuc.edu:8081/u?/emblems,324 </dc:identifier>   </oai_dc:dc>   </metadata>   </record>
    • About section often not used
      • Eg to state rights on the metadata record
  • 12. Examples of repositories
    • Library of Congress
    • http://memory.loc.gov/cgi-bin/oai2_0
    • ContentDM at UIUC
    • http://images.library.uiuc.edu:8081/cgi-bin/oai.exe
    • Ohio State Knowledge Bank
    • https://kb.osu.edu/dspace-oai/request
    Oct 2009 [email_address]
  • 13. PictureAustralia
    • Aggregates from large institutions
    • Web crawling for small ones
    • Flickr for individuals
    “ Using OAI has the advantage that only new and changed records need to be harvested, while for web crawl harvesting all records have to be re-harvested each time a harvest is run.” http://www.pictureaustralia.org/schemas/pa/index.html [email_address] Oct 2009
  • 14. DRIVER – aggregation as an infrastructure [email_address] Oct 2009
  • 15. Europeana [email_address] Oct 2009
  • 16. IVOA – synchronization of service repositories [email_address] Oct 2009
  • 17. Turn key systems
    • ContentDM : http://contentdm.com/
    • Digitool : http://www.exlibrisgroup.com/digitool.htm
    • DSpace : http://www.dspace.org/
    • EPrints : http://software.eprints.org/
    Oct 2009 [email_address]
  • 18.
    • Interoperability in practice
    • Quality issues
    • with OAI aggregations
    Oct 2009 [email_address]
  • 19. Metadata formats
    • DC, QDC, ETDMS, MODS, MARC, EAD, …
      • Require an XML schema
      • Most implementations only use simple DC
    Oct 2009 [email_address]
  • 20. Example of values found in DC:Date
    • September 29–October 28, 51 AD; 1970
    • second half of IXth century AD; 1978
    • Rebuilt 1984
    • Possibly Vth/VIth century AD; 1935
    • Planted 1985
    • n/a
    • n.d.
    • Mid IInd century AD; 1973
    • Jul-51
    • circa 900 AD
    • ca. 701 BC
    • Begun 14th century
    • 184-?
    • 1839
    • 18–?
    • August 23, 2000
    • between 1827 and 183
    • VIIIth/IXth century AD ? (TC);1965
    • Vth-VIth century AD (McNamee); IVth century AD (Cribiore); 1982
    XVIII Dynasty Winter 2003 era of redevelopment various 2002-00 1980, refurbished 1997 China: Neolithic Period (5000 BCE-ca 1600 BCE)? 19691968 21. Nouemb. Anno. 1564 . And finisshed on the euen of thanunciacion of our said bilissid Lady falling on the wednesday the xxiiij daye of Marche. in the xix yeer of Kyng Edwarde the fourthe [1479]] 19193 xxxx Oct xx Various 1938-05-38 1963 to 1953 [not after 1579] 163[5?] [email_address] Oct 2009
  • 21. Who is a metadata made for?
      • machine
        • Dc:type “Text.Correspondence.Letter”
        • Dc:language “wln”
      • human
        • Dc:type Correspondence
        • Dc:language “wallon”
      • Who knows ?
        • Dc:date “197- “
        • Dc:description “First ed. Cf. BM. “
    [email_address] Oct 2009
  • 22. Improving quality
      • Quality certificates for open access repositories
        • DINI - Deutsche Initiative für Netzwerkinformation
      • Best practices for OAI and shareable metadata by the Digital Library Federation and the National Science Digital Library
        • http://www.diglib.org/pubs/dlf108.pdf
      • Meeting with software providers
      • Test environment (eg Europeana)
      • Community guidelines
    Oct 2009 [email_address]
  • 23. Conclusion
      • The protocol « crossed the chasm »?
      • The objective is to create a network of repositories rather than networking individual resources
        • Lack of specific mechanism to relate resources to each other
        • Approach to linked data and OAI-ORE
    Oct 2009 [email_address]
  • 24.
    • OAI-PMH
    • http://www.openarchives.org/pmh/
    • Best practices for OAI and shareable metadata
      • http://www.diglib.org/pubs/dlf108.pdf
    • Tim Cole and Muriel Foulonneau, Using the Open Archives Initiative Protocol for Metadata Harvesting , Libraries Unlimited, 2007
    • Muriel Foulonneau and Jenn Riley Metadata for Digital resources, Chandos Publishing, 2008
    References [email_address] Oct 2009