Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Harvesting Repositories: DPLA, Europeana, & Other Case Studies

902 views

Published on

Join this discussion on the benefits and process of harvesting to aggregators such as DPLA, Europeana and other aggregators. Through case studies we'll outline three stages of the process, including 1) mapping, migrating, and normalizing data in open source digital repositories, 2) making use of the Open Archives Initiative Protocol for Metadata Harvesting (OAI - PMH), and 3) reaping the benefits of increased exposure. Presenters welcome lively discussion and questions from participants of all technical backgrounds and skill levels.

Published in: Software
  • Be the first to comment

Harvesting Repositories: DPLA, Europeana, & Other Case Studies

  1. 1. Harvesting Repositories DPLA, Europeana, and Other Case Studies ALA Conference June 25, 2016
  2. 2. Introductions Erin Tripp, Bus. Dev. Staff librarian since 2011. Erin delivers Islandora training at events worldwide and has managed more than 40 digital repository projects. Contact Details ●  Email: erin@discoverygarden.ca ●  Twitter: @eeohalloran or @discgarden ●  Hashtags: #islandora #ALAAC16
  3. 3. Agenda Objectives Overview By Show of Hands & Introductions Why Should We Care? Repository Requirements OAI-PMH Overview Case Studies Top Takeaways
  4. 4. Objectives for Today Learn a thing or two about: ●  OAI-PMH ●  Common Harvesters ●  Who to ask for help ●  What questions to ask ●  Confidence to continue learning/ try a new tool
  5. 5. By Show of Hands... Who is interested in ●  National Harvester, ●  State Harvester, ●  Subject Harvester, or ●  Proprietary Discovery Service Harvester? Who has already been involved in a harvesting project? Who has experience using ●  XLSTs ●  OAI-PMH ●  REPOX?
  6. 6. Why should we care? Discoverability.
  7. 7. Why should we care? Discoverability. February 2015 LITA panelists said Top Technology Trends include enhancing discoverability (Enis, 2015) Making content accessible where the search originates (e.g. Google, Google Scholar, WorldCat, DPLA, Europeana) creates value for digital libraries and users Repositories contributing to aggregators can experience increased site visits from 55-109 per cent (DPLA, n.d.)
  8. 8. Why should we care? Discoverability. Increased exposure through ●  Blogs, social media and Wikipedia, Provide richer context and increase the visibility of your collections Make your collections available for re-use by other services (Europeana, n.d.) Access to valuable skills Data modelling Copyright and licensing Reporting on access usage analytics (Europeana, n.d.)
  9. 9. Why should we care? Discoverability. Using open source Linking up to thousands of other collections Interoperable (no vendor lock in/ proprietary formats) Access to Wikimedia Commons (Europeana, n.d.) Expanding your network Connect with like-minded industry professionals Identify potential partners and joint funding opportunities Reach out to other sectors – creatives, education, tourism and more (Europeana, n.d.)
  10. 10. Why should we care? Discoverability. Anecdotally, repository harvest can: ●  Act as incentive for people to deposit content into the repository / buy-in from stakeholders ●  Clean up and normalize metadata resulting in better raw material to support discovery
  11. 11. OAI-PMH Overview
  12. 12. OAI-PMH Open Archives Initiative Protocol for Metadata Harvesting (OAI- PMH) Low-barrier mechanism for repository interoperability OAI-PMH is a set of six requests (aka verbs or services) that are invoked within HTTP
  13. 13. Providers Data Providers are repositories that expose structured metadata via OAI-PMH = Repository Service Providers then make OAI- PMH service requests to harvest that metadata = Harvester
  14. 14. Vocabulary Request/ Verb/ Service The action that the service provider (harvester) is requesting from the data provider (repository) Response Size The maximum number of records to issue per response
  15. 15. Vocabulary… continued Resumption Token When a request returns records greater than the response size a resumptionToken is issued such that the service provider can resume harvesting from where it left off Identify This request used to retrieve information about a repository. Some of the information returned is required as part of the OAI-PMH. Example: YourSite/oai2?verb=Identify
  16. 16. Vocabulary… continued ListMetadataFormats This request is used to retrieve the metadata formats available from a repository. Example: YourSite/oai2?verb=ListMetadataFormats ListRecords This request is used to harvest records from a repository. Optional arguments permit selective harvesting of records based on set membership and/or datestamp. Example: YourSite/oai2?verb=ListRecords&metadataPrefix=oai_dc
  17. 17. Vocabulary… continued ListSets This request is used to retrieve the set structure of a repository, useful for selective harvesting All Collections Example: YourSite/oai2?verb=ListSets Specific Collection Example: YourSite/oai2? verb=ListRecords&metadataPrefix=oai_dc&set=ir_citationCollection
  18. 18. Repository Requirements Accessible to the web Storing standards, XML-based descriptive metadata The ability to apply additional metadata mapping if needed (rather in or external to repository) Access to documentation and XSLTs used for metadata mapping
  19. 19. Repository Requirements Pass XML metadata to service provider from the: 1.  Preservation (storage) component or 2.  Discovery (index) component Provide a method to harvest a TN and link back to repository Accommodate customization
  20. 20. Repository Requirements … Continued For example: University of South Carolina video content model is tiered for preservation, media production and streaming web access. We only want to harvest one of three possible records
  21. 21. Case Study Europeana
  22. 22. Europeana Our material comes from all over Europe and the scope of the collections is really quite astonishing. [...] http://www.europeana.eu/ http://pro.europeana.eu/
  23. 23. Intermediate Aggregator Digibess repo stores digitized objects from 18 Economic and Social Sciences libraries in Italy Europeana requires an intermediate aggregator; a national harvester such as Cultura Italia Cultura Italia harvests custom “Pico” metadata format from Digibess and then is harvested by Europeana
  24. 24. Harvesting Tools Digibess pre-dated Islandora OAI module and REPOX aggregator Used Proai servlet oaiprovider-1.2.2 Harvest resulted in examining in general needs and specific applications of the protocol
  25. 25. Digibess on Europeana
  26. 26. REPOX Since the Digibess project a new intermediate aggregator has been released called REPOX. It aims to provide [...] Europeana partners a simple solution to import, convert and expose their bibliographic data via OAI-PMH http://repox.sysresearch.org/
  27. 27. Case Study Digital Public Library of America (DPLA)
  28. 28. DPLA The Digital Public Library of America brings together the riches of America’s libraries, archives, and museums, and makes them freely available to the world. https://dp.la/info/
  29. 29. Service Hub Empire State Digital Network (ESDN) is the New York State service hub for the DPLA Hosted and administered by the Metropolitan New York Library Council in conjunction with eight allied regional library councils working collectively in New York State as the ESLN Liaise with partners for data aggregation, mapping and licensing
  30. 30. Mapping & Testing Harvests from partners using OAI- PMH o  Provides all partner metadata to DPLA through one OAI-PMH feed from REPOX Undertakes data review and QA prior to exposing feed to DPLA for harvest
  31. 31. ESDN on DPLA
  32. 32. Case Study Other Discovery Services
  33. 33. Other Discovery Services WorldCat, Summon, & Primo are commercial discovery services Local discovery layers can also collocate resources for discovery OAI -PMH modules within your repository framework can allow for these services to harvest your repository
  34. 34. Everyone is Harvesting Everyone Connecticut State Library aggregating data to Research It State Library harvests University of Connecticut Archives and Special Collections, ILS and other University of Connecticut Library harvests to Summon/ Primo and will be harvested by DPLA
  35. 35. Creating Lots of Portals University of Connecticut Library started harvesting in mid 2014 Notable increases in access to digital content since harvest (one of many factors) Access statistics available at CTDA Statistics
  36. 36. University of Connecticut on Research It - EBSCOhost
  37. 37. Harvesting Top Takeaways
  38. 38. Top Takeaways - Data Providers ●  Server Load/ Application Load ●  Permissions / Copyright ●  Relationships with Service Providers ●  Repository Buy-in ●  Increased Discovery ●  Metadata Normalization
  39. 39. Top Takeaways - Service Providers ●  Knowledge of ○  XSLT, ○  OAI-PMH, and ○  Metadata Schema Knowledge (DC, MODS, QDC, MARC XML) ●  Technical staff to set-up and maintain the aggregator & write scripts to transform harvested metadata ●  Relationships with Data Providers
  40. 40. Harvesting Discussion
  41. 41. Discussion ●  What are your biggest challenges? ●  What Resources do you find helpful? ●  What was your AH HA! moment? ●  What was most useful in this presentation?
  42. 42. Harvesting Demonstration
  43. 43. Demonstration To follow along or try it at home, navigate to…. http://sandbox.discoverygarden.ca/ OR http://islandora.ca/downloads Click Islandora > Islandora Utility Modules > Islandora OAI
  44. 44. Questions? Contact us at: erin@discoverygarden.ca

×