Implementation of the RIOXX Metadata Guidelines in the UK's repositories through a harvesting service
Implementation of the RIOXX metadata
guidelines in the UK’s repositories
through a harvesting service
Matteo Cancellieri & Nancy Pontika
CORE
The Open University
@oacore
RIOXX metadata
The RIOXX Metadata Application Profile provides
a mechanism to help institutional repositories
comply with the RCUK policy on open access.
RIOXX focuses on applying consistency to the
metadata fields used to record research funder
and project/grant identifiers and is designed to
support the consistent tracking of open-access
research publications across scholarly systems.
[Source: http://rioxx.net/]
MD) This is an overview of the CORE harvesting process. When harvesting, for every data provider, this could mean a repository, we run a set of tasks. Here (hand gesture) you can see some of the steps of the pipeline, and the ones highlighted, the Metadata Download and the RIOXX Compliance tasks, are the ones impacted by the adoption of RIOXX in CORE.
I am going to say a bit more about the harvesting process: At first we download the metadata, we do this using mainly the OAI-PMH protocol. In the past, our focus was on the OAI DC Standard, while now, if available, we choose the RIOXX protocol.
EM) When we download the RIOXX metadata we process XML file and store the necessary metadata to index the record. We do that in the Extract Metadata task (hand gesture).
RC) Further down the pipeline, at the RIOXX compliance task (hand gesture) we check the compliance of the records, following the RIOXX guidelines.
Currently we have a service that implements a simple validation of the XML records through a schema; we check the presence and type of the content, but not the content quality. This validation is only a first step, but provides useful information to the repository managers with regards to compliance levels.
MD) This is an overview of the CORE harvesting process. When harvesting, for every data provider, this could mean a repository, we run a set of tasks. Here (hand gesture) you can see some of the steps of the pipeline, and the ones highlighted, the Metadata Download and the RIOXX Compliance tasks, are the ones impacted by the adoption of RIOXX in CORE.
I am going to say a bit more about the harvesting process: At first we download the metadata, we do this using mainly the OAI-PMH protocol. In the past, our focus was on the OAI DC Standard, while now, if available, we choose the RIOXX protocol.
EM) When we download the RIOXX metadata we process XML file and store the necessary metadata to index the record. We do that in the Extract Metadata task (hand gesture).
RC) Further down the pipeline, at the RIOXX compliance task (hand gesture) we check the compliance of the records, following the RIOXX guidelines.
Currently we have a service that implements a simple validation of the XML records through a schema; we check the presence and type of the content, but not the content quality. This validation is only a first step, but provides useful information to the repository managers with regards to compliance levels.
MD) This is an overview of the CORE harvesting process. When harvesting, for every data provider, this could mean a repository, we run a set of tasks. Here (hand gesture) you can see some of the steps of the pipeline, and the ones highlighted, the Metadata Download and the RIOXX Compliance tasks, are the ones impacted by the adoption of RIOXX in CORE.
I am going to say a bit more about the harvesting process: At first we download the metadata, we do this using mainly the OAI-PMH protocol. In the past, our focus was on the OAI DC Standard, while now, if available, we choose the RIOXX protocol.
EM) When we download the RIOXX metadata we process XML file and store the necessary metadata to index the record. We do that in the Extract Metadata task (hand gesture).
RC) Further down the pipeline, at the RIOXX compliance task (hand gesture) we check the compliance of the records, following the RIOXX guidelines.
Currently we have a service that implements a simple validation of the XML records through a schema; we check the presence and type of the content, but not the content quality. This validation is only a first step, but provides useful information to the repository managers with regards to compliance levels.
MD) This is an overview of the CORE harvesting process. When harvesting, for every data provider, this could mean a repository, we run a set of tasks. Here (hand gesture) you can see some of the steps of the pipeline, and the ones highlighted, the Metadata Download and the RIOXX Compliance tasks, are the ones impacted by the adoption of RIOXX in CORE.
I am going to say a bit more about the harvesting process: At first we download the metadata, we do this using mainly the OAI-PMH protocol. In the past, our focus was on the OAI DC Standard, while now, if available, we choose the RIOXX protocol.
EM) When we download the RIOXX metadata we process XML file and store the necessary metadata to index the record. We do that in the Extract Metadata task (hand gesture).
RC) Further down the pipeline, at the RIOXX compliance task (hand gesture) we check the compliance of the records, following the RIOXX guidelines.
Currently we have a service that implements a simple validation of the XML records through a schema; we check the presence and type of the content, but not the content quality. This validation is only a first step, but provides useful information to the repository managers with regards to compliance levels.
This is how the RIOXX compliance section looks like in the dashboard.
The results in the RIOXX webpage are based on a sample, while our results cover all the records in a repository. For validation purposes, we have compared the results from RIOXX and CORE and have found them to be consistent.
(and the one in the RIOXX webpage are consistent, we had also some interesting examples where the results where significantly different, for example one repository was 99% compliant for us and only 5% for the RIOXX page. We investigated and we notice that there was only one field causing this huge difference. RIOXX explicitly define a date format, while we check only for a date. So for us the repo was basically fully compliant while RIOXX was marking the repository as not compliant.)
The cool part or what I like most about it or something like that If you click in the “show/hide messages you can also see a detailed explanation for every record that is not compliant and why.
There is room for improvement,
We are working on implementing a more complete compliance check; we had a fruitful chat with Paul Walk, where he gave us access to the open source code of the RIOXX validation and we are working to implement the rules defined in the github repository in our code to have a more thorough compliance check.
We are also working on feeding back to RIOXX the aggregated results and expand access to the service to other interested research stakeholders, such as funders.
In the future the rioxx metadata fields will be indexed and integrated with our API and dataset.
Thanks a lot, I am going to have a guinnes now
There is room for improvement,
We are working on implementing a more complete compliance check; we had a fruitful chat with Paul Walk, where he gave us access to the open source code of the RIOXX validation and we are working to implement the rules defined in the github repository in our code to have a more thorough compliance check.
We are also working on feeding back to RIOXX the aggregated results and expand access to the service to other interested research stakeholders, such as funders.
In the future the rioxx metadata fields will be indexed and integrated with our API and dataset.
Thanks a lot, I am going to have a guinnes now
There is room for improvement,
We are working on implementing a more complete compliance check; we had a fruitful chat with Paul Walk, where he gave us access to the open source code of the RIOXX validation and we are working to implement the rules defined in the github repository in our code to have a more thorough compliance check.
We are also working on feeding back to RIOXX the aggregated results and expand access to the service to other interested research stakeholders, such as funders.
In the future the rioxx metadata fields will be indexed and integrated with our API and dataset.
Thanks a lot, I am going to have a guinnes now
There is room for improvement,
We are working on implementing a more complete compliance check; we had a fruitful chat with Paul Walk, where he gave us access to the open source code of the RIOXX validation and we are working to implement the rules defined in the github repository in our code to have a more thorough compliance check.
We are also working on feeding back to RIOXX the aggregated results and expand access to the service to other interested research stakeholders, such as funders.
In the future the rioxx metadata fields will be indexed and integrated with our API and dataset.
Thanks a lot, I am going to have a guinnes now
There is room for improvement,
We are working on implementing a more complete compliance check; we had a fruitful chat with Paul Walk, where he gave us access to the open source code of the RIOXX validation and we are working to implement the rules defined in the github repository in our code to have a more thorough compliance check.
We are also working on feeding back to RIOXX the aggregated results and expand access to the service to other interested research stakeholders, such as funders.
In the future the rioxx metadata fields will be indexed and integrated with our API and dataset.
Thanks a lot, I am going to have a guinnes now