Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime. - OAI-PMH protocol validation and data extraction tool


Published on

OAIPMH validator is a web application which enables validation and data extraction from OAI-PMH enabled digital libraries. Features include:
* Check OAI-PMH standards compliance.
* Check compliance with Dublin Core (DC)
* Check compliance with Europeana Semantic Elements (ESE).
* View, print or download the output of all OAI-PMH supported commands.
* Detect problems with metadata records (e.g. invalid URLs, empty titles, invalid date formats etc.)
* Download all records from one or more digital libraries in parallel.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this - OAI-PMH protocol validation and data extraction tool

  1. 1. Vangelis Banos Information & Communication Systems Engineer, Msc email: vbanos [at] gmail [dot] com web: Open Archives Initiative Protocol for Metadata Harvest Validator & Data Extraction Tool
  2. 2. Creating an OAI-PMH validation tool <ul><li>The process of validating an OAI-PMH enabled digital library is quite complex and may become tedious when dealing with a large number of digital libraries. </li></ul><ul><li>Digital libraries must comply with DC and ESE </li></ul><ul><li>Extra sanity checks must be performed in order to ensure their correctness </li></ul><ul><li>The aforementioned tasks are overwhelming when dealing with a large number of libraries </li></ul>
  3. 3. About <ul><li> is a free web application capable of performing all the necessary checks required to ensure that an OAI-PMH enabled digital library is ready on a technical level to be part of Europeana. </li></ul>
  4. 4. features <ul><li>Validation: The validation of an OAI-PMH enabled digital library requires only the submission of the OAI-PMH web service URL. After this process is completed, the user is presented with a checklist of validation checks which have been performed and their results in real time. </li></ul><ul><li>Metadata extraction: Users can provide the system with a list of OAI-PMH URLs and retrieve all the metadata records which are available from them in parallel. Using this feature, users can retrieve a large number of metadata records from multiple libraries rapidly and easily, thus enabling them to inspect them and evaluate them. </li></ul>
  5. 5. validation <ul><li>HTTP Protocol validation </li></ul><ul><ul><li>http request / response headers </li></ul></ul><ul><ul><li>http content type, response size & type, encoding, charset, time </li></ul></ul><ul><li>XML document validation </li></ul><ul><ul><li>Using standard XML validation techniques (e.g. XML tidy library) </li></ul></ul><ul><li>XML Schema validation </li></ul><ul><ul><li>http:// oaipmh .com/files/OAI-PMH. xsd </li></ul></ul><ul><ul><li>http://www. europeana . eu /schemas/ ese /ESE-V3.3. xsd </li></ul></ul><ul><li>OAI-PMH protocol validation </li></ul><ul><ul><li>Check for supported commands and metadata prefixes </li></ul></ul><ul><li>XML document content validation </li></ul><ul><ul><li>Check for invalid record URLs, invalid email addresses </li></ul></ul><ul><ul><li>Check for empty or malformated fields (e.g. title, description, setSpec) </li></ul></ul>
  6. 6. benefits <ul><li>The use of has improved the process of validating new and existing OAI-PMH enabled libraries. </li></ul><ul><li>Administrators are able to evaluate digital libraries using a quick and intuitive tool. What is more, the free access of the tool guarantees that anyone can take advantage of </li></ul><ul><li>Regular users of include: </li></ul><ul><ul><li>The Hellenic Aggregator ( http://aggregator. libver . gr ) </li></ul></ul><ul><ul><li>Greek digital libraries search engine ( </li></ul></ul>
  7. 7. – Example Inputs <ul><li> </li></ul><ul><li>DOAJ </li></ul><ul><li>Μέδουσα </li></ul>
  8. 8. – Future work <ul><li>Roadmap: </li></ul><ul><ul><li>Add more validation rules </li></ul></ul><ul><ul><li>Support more metadata formats (such as Europeana Data Model) </li></ul></ul><ul><ul><li>Improve service robustness & performance </li></ul></ul><ul><ul><li>Create a public API to encourage third-party usage </li></ul></ul><ul><li>Please give us your feedback in order to improve </li></ul>