Harvesting Using the Open Archives Initiative Protocol: What Your OAI Stream Can Tell You


Published on

Webinar from the Mountain West Digital Library
Sandra McIntyre, MWDL Director
Anna Neatrour, MWDL Digital Metadata Librarian

Want to understand what happens behind the scenes with the MWDL harvesting? In this webinar, Sandra McIntyre and Anna Neatrour will explain the Open Archives Initiative Protocol for Metadata Harvestiong (OAI-PMH) and how it makes metadata aggregation possible in the MWDL. They will explain the process of harvesting and how MWDL normalizes your metadata. They will also show you how you can learn from your collections' OAI stream by using the six query verbs (requests) defined in the OAI-PMH.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • See this in slideshow view to see the animations!
  • Registered at http://openarchives.org
  • Open Access repositories of scholarly communications materials
  • MOVE UPFind your base URLAdd to OAI pageUpdate OAI page with information about base URL for other platforms.OmekaContentdmBePressMWDL Harvesting Log – example to wind up and complete process what primo is doing with OAINormalization routines are runCounter examples – mapping that is wrong May not have set enabled for OAIMetadata formats associated with OAI. Dublin core among othersOAI provider may or may not be configured to provide qualified dublin core
  • awhof = Arizona Women’s Hall of Fame
  • ANNA
  • Use Identify to make sure that the OAI provider is set up and working. This is a great query to use if you are uncertain of the OAI provider URL for your digital asset management system and want to test it to be sure.
  • This is the information that is returned from an identify query. You will see here we have the repository name, and also the administrator/contact information for the person who administers the server.
  • ListSets asks what sets are available for harvest. This is a great thing to check yourself to make sure that all the collections are enabled for harvest that you want, or if you have a digital collection with some sort of restriction like on-campus access only, you can check to make sure that it isn’t available for harvest.
  • The set spec or alias for each collection is listed. If you have a new collection that you want to be added to MWDL, the set spec is one of the pieces of information I’ll need in order to get the harvesting set up.
  • What metadata formats are available?
  • Here you will notice that both simple dublin core or qualified dublin core are available from the SUU server. MWDL prefers to harvest in qualified dublin core if possible.
  • In real life if you are playing around with OAI queries in your browser, you might not run this, because it gives you all of the records from the available collections in qualified dublin core. That’s a lot of records! This is typical of the type of request that MWDL would make to harvest records, in whatever type of batch the server is set up to share.
  • Here we can see some records coming in from SUU. I can see the set spec hist_photos and go down and see the first record coming in, including all the descriptive information that is made available for that record.
  • I like to check the OAI for one set at a time when I’m checking out metadata to make sure that it matches up with the MWDL Dublin Core Application profile. This is something you can do too, if you want to do a quick check to make sure that all of the required fields are showing up correctly. You can also look at one record at a time by using the identifier associated with that record.
  • I like to check the OAI for one set at a time when I’m checking out metadata to make sure that it matches up with the MWDL Dublin Core Application profile. This is something you can do too, if you want to do a quick check to make sure that all of the required fields are showing up correctly. You can also look at one record at a time by using the identifier associated with that record.
  • CONTENTdm’s OAI provider can be accessed from the server tab, then click on harvesting to see the controls. Here, we’d want enable OAI set to YES! This is also where I could look up the base URL for my repository, if I wasn’t sure what it was. I could change the name of the repository to something more descriptive, include server admin e-mails. I would want to leave enable compound object pages set to “no”. If that’s enabled, all the individual pages would be harvested as single objects. MWDL would then end up with thousands of items called “page 1” or “page 2”.By default, if no collections are specified everything is published. You might run into a situation where you want to expose some collections but not others for harvesting, in which case, you would need to add the set spec or alias for each collection that should be harvested.
  • Here we can take a look at what a record with local field labels looks like vs the same record’s information in OAI. Notice how the local field labels disappear, so the classification information from the Western Soundscape Archive is all mapped to dc:description.
  • Repeated fields are merged into one in the MWDL. For example, the local record had multiple contributors listed, this information is now in one field. The source record also had separate rights statements for the creator of the sound recording of an animal and the creator of a photo of the animal. These statements are now in one field.
  • Here we can see the same record with slightly different information displayed in MWDL and DPLA. DPLA has different normalizing routines, for example the designation of the digital collection associated with the record as Western Soundscape Archive isn’t in the DPLA record, but people can still click through and view that information at the source record.
  • You can do some self-auditing to make sure that everything in your local collection is displaying in the manner in which you would like it to be harvested. We have a CONTENTdm field properties template and guide that you can use to help make sure everything is set up correctly.
  • Western Soundscape archive field properties. See where things have been set to no for “hide” and mapped to dublin core. If some of these fields were unmapped or set to hidden, they would not appear as harvestable in the OAI for the collection.
  • We have an OAI queries page with quick links to try for everything we went over during this presentation. This is also where you can find the CONTENTdm field properties template.
  • Thanks for participating in the webinar today! If you have any follow-up questions please feel free to contact me!
  • Harvesting Using the Open Archives Initiative Protocol: What Your OAI Stream Can Tell You

    1. 1. Harvesting Using the Open Archives Initiative Protocol: What Can Your OAI Stream Tell You? Sandra McIntyre, MWDL Director Anna Neatrour, MWDL Digital Metadata Librarian
    2. 2. The basics WHY OAI?
    3. 3. Open Archives Initiative Open Archives Initiative http://openarchives.org “Standards for Web Content Interoperability” • Facilitate the efficient dissemination of content contained in archives/repositories • Low-barrier framework and standards
    4. 4. Why is a protocol necessary? “Give me...” “I want it.” “I have it.” OAI Harvester OAI Provider “Here is what you requested.”
    5. 5. OAI-PMH Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) http://www.openarchives.org/pmh/
    6. 6. OAI Providers
    7. 7. OAI Providers
    8. 8. OAI Harvesters Mountain West Digital Library http://mwdl.org OAIster http://oaister.worldcat.org and included in WorldCat Digital Public Library of America http://dp.la/ Institute of Museum & Library Services Digital Collections and Content http://imlsdcc.grainger.uiuc.edu ...and thousands more
    9. 9. Harvesting at MWDL Utah State Archives Utah State Library Univ of Nevada Las Vegas Univ of Nevada Reno Utah Dvsn Arts & Museums Salt Lake Comm. College Arizona Memory Project Snow College Northern Arizona Univ Weber State Univ Univ of Idaho Utah State Univ Family Search Utah Valley Univ LDS Church History Southern Utah Univ Montana Memory Project Stacks (Idaho) BYU Univ of Utah Idaho State Archives Mountain West Digital Library Boise State Univ.
    10. 10. Why understand OAI? • Predict what will happen with your metadata when it is harvested • Do self-auditing and/or peer auditing of metadata: See patterns and find errors
    11. 11. Other metadata harvesting options • Handing over a hard drive • Uploading/downloading via file transfer protocol (FTP) • Other requests of XML (typically application programming interfaces, APIs): – Web Services – X-Services
    12. 12. Advantages of OAI • Update at a distance, anytime • Specify desired records – By collection – By date range of last change to record • Packets, one at a time • Works fast • Repeatable
    13. 13. Queries and responses THE PROTOCOL
    14. 14. Queries and Responses OAI query OAI Harvester OAI Provider OAI response
    15. 15. http://re.cs.uct.ac.za/ Testing an OAI Provider
    16. 16. OAI query Queries: OAI BaseURL BaseURL = OAI provider root address (Doesn’t work alone) Examples: • http://aura.abdn.ac.uk/dspace-oai/request • http://absronline.org/journals/index.php/ind ex/oai • http://cyberleninka.ru/oai • http://digitalcommons.usu.edu/cgi/oai2.cgi • http://www.avhumboldt.net/oai/oai.php
    17. 17. OAI query Verb = type of request Initial capitals; no spaces Examples: • Identify • ListMetadataFormats • ListSets • ListIdentifiers • ListRecords • GetRecord Queries: 6 Verbs
    18. 18. OAI query Queries: Parameters & Values Parameters & values = details about request Format: parameter=value Examples: • metadataPrefix=oai_dc • metadataPrefix=qdc • set=awhof • identifier=oai:content.lib.utah.edu:etd3/482
    19. 19. Queries you can use EXAMPLES
    20. 20. Identify “Who are you?” http://contentdm.li.suu.edu/oai/oai.php?verb=Identify OAI query OAI Harvester “I am the SUU CONTENTdm Server Repository.” OAI response OAI Provider
    21. 21. Identify “I am the SUU CONTENTdm Repository.”
    22. 22. ListSets “What sets do you have available?” http://contentdm.li.suu.edu/oai/oai.php?verb=ListSets OAI query OAI Harvester OAI Provider “Here is the list of sets.” OAI response
    23. 23. “Here’s the list of sets.” ListSets
    24. 24. ListMetadataFormats “What metadata formats are available?” http://contentdm.li.suu.edu/oai/oai.php?verb=ListMetadataFormats OAI query OAI Harvester OAI Provider “Here’s the list of metadata formats.” OAI response
    25. 25. ListMetadataFormats “Here’s the list of metadata formats.”
    26. 26. ListRecords “Give me the metadata for all records in qualified Dublin Core.” http://contentdm.li.suu.edu/oai/oai.php?verb=ListRecords& metadataPrefix=oai_qdc OAI query OAI Harvester OAI Provider “Here are the records.” OAI response
    27. 27. ListRecords “Here are the records.”
    28. 28. ListRecords • One set only: http://contentdm.li.suu.edu/oai/oai.php ?verb=ListRecords&metadataPrefix =oai_qdc&set=hist_photos • If more than one screen of records, use a resumption token to get the additional lists (200 at a time in this example): http://contentdm.li.suu.edu/oai/oai.php ?verb=ListRecords&resumptionTok en=hist_photos:200:hist_photos:0000-0000:9999-99-99:oai_qdc
    29. 29. GetRecord • One record only: http://contentdm.li.suu.edu/oai/oai.ph p?verb=GetRecord&metadataPrefix =oai_qdc&identifier=oai:contentdm.li. suu.edu:hist_photos/0
    30. 30. CONTENTdm’s OAI Provider • Turning on OAI: Administrative interface in the “Server” tab • Choosing which collections to share • Sharing compound object level metadata only Image from CONTENTdm OAI guide: http://contentdm.org/help6/server-admin/oai.asp
    31. 31. Record -> OAI Local Record with Labels OAI
    32. 32. OAI -> MWDL OAI MWDL
    33. 33. MWDL -> DPLA MWDL DPLA
    34. 34. Some Final Things to Remember • Check your own OAI stream and see what it looks like! – Mapped to none – not in OAI stream – Hidden set to yes – not in OAI stream – CONTENTdm field properties template and guide available at: http://mwdl.org/getinvolved/getinvolved.php – Login to collection admin, click on tab, go to fields to check and edit properties
    35. 35. Field Mappings in CONTENTdm Field Mapping example from the Western Soundscape Archive
    36. 36. Try it yourself! Resources available at http://mwdl.org/getinvolved/getinvolved.php
    37. 37. We’re here to help! • For additional questions about self-auditing your OAI contact Anna Neatrour: – anna.neatrour@utah.edu – 801-587-8883 • Any Questions?