Open Archives Initiative Protocol for Metadata Harvesting


Published on

Dublin Core conference 2009 Seoul, Oct 2009

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Open Archives Initiative Protocol for Metadata Harvesting

  1. 1. The Open Archives Initiative Protocol for Metadata Harvesting Muriel Foulonneau Tudor Research Centre [email_address] 10/2009 Dublin Core conference 2009, Seoul
  2. 2. The protocol was born <ul><li>To create a minimal layer of interoperability between distributed repositories of scientific publications </li></ul><ul><ul><li>An alternative to federated search </li></ul></ul><ul><ul><li>Networking of digital repositories </li></ul></ul>Oct 2009 [email_address]
  3. 3. “ OAI divides the world between data providers and service providers” Oct 2009 [email_address]
  4. 4. Sharing metadata : Data aggregation <ul><li>The portal gathers metadata and implements its own retrieval system </li></ul>Oct 2009 [email_address] Mill? Eg. Search engines, union catalogs, OAI <title>My resource</title> <date>04
  5. 5. The OAI framework Oct 2009 [email_address] Service provider Harvester Data provider Data provider Data provider Agregator <ul><li>Mechanisms to transfer large datasets </li></ul><ul><ul><li>Resumption tokens </li></ul></ul><ul><ul><li>Incremental harvesting </li></ul></ul>Portal interface Repository Data provider Repository Repository
  6. 6. Incremental harvest Harvester Data providers What’s new since the last time I came? <ul><li>New or modified records </li></ul><ul><li>Deleted records </li></ul>[email_address] Oct 2009 <title>My resource</title> <date>04
  7. 7. OAI is based on standards <ul><li>HTTP protocol </li></ul><ul><li>XML and XML Schemas </li></ul><ul><li>Dublin Core </li></ul>Oct 2009 [email_address]
  8. 8. Dublin Core MARC21 MODS Multiple representations of an object School of arts for girls Kiz Sanayi Mektebi] [email_address] Oct 2009
  9. 9. OAI repositories can be organized in sets Oct 2009 [email_address] <ul><li>Enable selective harvesting </li></ul><ul><li>Sets can overlap: 1 item in multiple sets </li></ul><ul><li>Can be described (eg with DC or DC Collection) </li></ul>
  10. 10. OAI supports 6 verbs <ul><li>Identify </li></ul><ul><li> </li></ul><ul><li>ListSets </li></ul><ul><ul><li> </li></ul></ul><ul><li>ListRecords </li></ul><ul><li>ListMetadataFormats </li></ul><ul><ul><li> </li></ul></ul><ul><li>ListIdentifiers </li></ul><ul><ul><li> </li></ul></ul><ul><li>GetRecord </li></ul><ul><ul><li> </li></ul></ul>Oct 2009 [email_address]
  11. 11. An OAI response Oct 2009 [email_address] <record> - < header >   <identifier> </identifier>   <datestamp> 2003-10-22 </datestamp>   <setSpec> emblems </setSpec>   </header> - < metadata > - <oai_dc:dc xmlns:oai_dc=&quot; &quot; xmlns:dc=&quot; &quot; xmlns:xsi=&quot; &quot; xsi:schemaLocation=&quot; &quot;>   <dc:creator> Müller, Johann Heinrich Traugott, 1631-1675 </dc:creator>     <dc:identifier>,324 </dc:identifier>   </oai_dc:dc>   </metadata>   </record> <ul><li>About section often not used </li></ul><ul><ul><li>Eg to state rights on the metadata record </li></ul></ul>
  12. 12. Examples of repositories <ul><li>Library of Congress </li></ul><ul><li> </li></ul><ul><li>ContentDM at UIUC </li></ul><ul><li> </li></ul><ul><li>Ohio State Knowledge Bank </li></ul><ul><li> </li></ul>Oct 2009 [email_address]
  13. 13. PictureAustralia <ul><li>Aggregates from large institutions </li></ul><ul><li>Web crawling for small ones </li></ul><ul><li>Flickr for individuals </li></ul>“ Using OAI has the advantage that only new and changed records need to be harvested, while for web crawl harvesting all records have to be re-harvested each time a harvest is run.” [email_address] Oct 2009
  14. 14. DRIVER – aggregation as an infrastructure [email_address] Oct 2009
  15. 15. Europeana [email_address] Oct 2009
  16. 16. IVOA – synchronization of service repositories [email_address] Oct 2009
  17. 17. Turn key systems <ul><li>ContentDM : </li></ul><ul><li>Digitool : </li></ul><ul><li>DSpace : </li></ul><ul><li>EPrints : </li></ul>Oct 2009 [email_address]
  18. 18. <ul><li>Interoperability in practice </li></ul><ul><li>Quality issues </li></ul><ul><li>with OAI aggregations </li></ul>Oct 2009 [email_address]
  19. 19. Metadata formats <ul><li>DC, QDC, ETDMS, MODS, MARC, EAD, … </li></ul><ul><ul><li>Require an XML schema </li></ul></ul><ul><ul><li>Most implementations only use simple DC </li></ul></ul>Oct 2009 [email_address]
  20. 20. Example of values found in DC:Date <ul><li>September 29–October 28, 51 AD; 1970 </li></ul><ul><li>second half of IXth century AD; 1978 </li></ul><ul><li>Rebuilt 1984 </li></ul><ul><li>Possibly Vth/VIth century AD; 1935 </li></ul><ul><li>Planted 1985 </li></ul><ul><li>n/a </li></ul><ul><li>n.d. </li></ul><ul><li>Mid IInd century AD; 1973 </li></ul><ul><li>Jul-51 </li></ul><ul><li>circa 900 AD </li></ul><ul><li>ca. 701 BC </li></ul><ul><li>Begun 14th century </li></ul><ul><li>184-? </li></ul><ul><li>1839 </li></ul><ul><li>18–? </li></ul><ul><li>August 23, 2000 </li></ul><ul><li>between 1827 and 183 </li></ul><ul><li>VIIIth/IXth century AD ? (TC);1965 </li></ul><ul><li>Vth-VIth century AD (McNamee); IVth century AD (Cribiore); 1982 </li></ul>XVIII Dynasty Winter 2003 era of redevelopment various 2002-00 1980, refurbished 1997 China: Neolithic Period (5000 BCE-ca 1600 BCE)? 19691968 21. Nouemb. Anno. 1564 . And finisshed on the euen of thanunciacion of our said bilissid Lady falling on the wednesday the xxiiij daye of Marche. in the xix yeer of Kyng Edwarde the fourthe [1479]] 19193 xxxx Oct xx Various 1938-05-38 1963 to 1953 [not after 1579] 163[5?] [email_address] Oct 2009
  21. 21. Who is a metadata made for? <ul><ul><li>machine </li></ul></ul><ul><ul><ul><li>Dc:type “Text.Correspondence.Letter” </li></ul></ul></ul><ul><ul><ul><li>Dc:language “wln” </li></ul></ul></ul><ul><ul><li>human </li></ul></ul><ul><ul><ul><li>Dc:type Correspondence </li></ul></ul></ul><ul><ul><ul><li>Dc:language “wallon” </li></ul></ul></ul><ul><ul><li>Who knows ? </li></ul></ul><ul><ul><ul><li>Dc:date “197- “ </li></ul></ul></ul><ul><ul><ul><li>Dc:description “First ed. Cf. BM. “ </li></ul></ul></ul>[email_address] Oct 2009
  22. 22. Improving quality <ul><ul><li>Quality certificates for open access repositories </li></ul></ul><ul><ul><ul><li>DINI - Deutsche Initiative für Netzwerkinformation </li></ul></ul></ul><ul><ul><li>Best practices for OAI and shareable metadata by the Digital Library Federation and the National Science Digital Library </li></ul></ul><ul><ul><ul><li> </li></ul></ul></ul><ul><ul><li>Meeting with software providers </li></ul></ul><ul><ul><li>Test environment (eg Europeana) </li></ul></ul><ul><ul><li>Community guidelines </li></ul></ul>Oct 2009 [email_address]
  23. 23. Conclusion <ul><ul><li>The protocol « crossed the chasm »? </li></ul></ul><ul><ul><li>The objective is to create a network of repositories rather than networking individual resources </li></ul></ul><ul><ul><ul><li>Lack of specific mechanism to relate resources to each other </li></ul></ul></ul><ul><ul><ul><li>Approach to linked data and OAI-ORE </li></ul></ul></ul>Oct 2009 [email_address]
  24. 24. <ul><li>OAI-PMH </li></ul><ul><li> </li></ul><ul><li>Best practices for OAI and shareable metadata </li></ul><ul><ul><li> </li></ul></ul><ul><li>Tim Cole and Muriel Foulonneau, Using the Open Archives Initiative Protocol for Metadata Harvesting , Libraries Unlimited, 2007 </li></ul><ul><li>Muriel Foulonneau and Jenn Riley Metadata for Digital resources, Chandos Publishing, 2008 </li></ul>References [email_address] Oct 2009