Successfully reported this slideshow.

Implementation of the RIOXX Metadata Guidelines in the UK's repositories through a harvesting service

1

Share

1 of 24
1 of 24

More Related Content

Implementation of the RIOXX Metadata Guidelines in the UK's repositories through a harvesting service

  1. 1. Implementation of the RIOXX metadata guidelines in the UK’s repositories through a harvesting service Matteo Cancellieri & Nancy Pontika CORE The Open University @oacore
  2. 2. What is CORE
  3. 3. What is CORE
  4. 4. > 170 API users Facts
  5. 5. > 74 Repositories Dashboard users Facts
  6. 6. > 980 Repositories Facts
  7. 7. > 8,900 Journals Facts
  8. 8. > 53 Languages Facts
  9. 9. > 30,058,914 Metadata Facts
  10. 10. > 2,874,683 Full-text Facts
  11. 11. Aiming for the moon!
  12. 12. CORE Repositories Dashboard
  13. 13. RIOXX metadata The RIOXX Metadata Application Profile provides a mechanism to help institutional repositories comply with the RCUK policy on open access. RIOXX focuses on applying consistency to the metadata fields used to record research funder and project/grant identifiers and is designed to support the consistent tracking of open-access research publications across scholarly systems. [Source: http://rioxx.net/]
  14. 14. Introducing RIOXX in the CORE pipeline
  15. 15. Introducing RIOXX in the CORE pipeline
  16. 16. Introducing RIOXX in the CORE pipeline
  17. 17. Introducing RIOXX in the CORE pipeline
  18. 18. End Result
  19. 19. Future work > Complete compliance check
  20. 20. Future work > Feed back to RIOXX aggregated results
  21. 21. Future work > Give funders and repository managers the status of their repository
  22. 22. Future work > Show RIOXX metadata on the CORE display pages and API
  23. 23. Future work > Comments/suggestions
  24. 24. Thank you! Matteo Cancellieri, Software Engineer, matteo.cancellieri@open.ac.uk Nancy Pontika, Open Access Aggregation Officer, nancy.pontika@open.ac.uk Website: http://core.ac.uk Email: theteam@core.ac.uk Twitter: @oacore

Editor's Notes

  • MD) This is an overview of the CORE harvesting process. When harvesting, for every data provider, this could mean a repository, we run a set of tasks. Here (hand gesture) you can see some of the steps of the pipeline, and the ones highlighted, the Metadata Download and the RIOXX Compliance tasks, are the ones impacted by the adoption of RIOXX in CORE.
    I am going to say a bit more about the harvesting process: At first we download the metadata, we do this using mainly the OAI-PMH protocol. In the past, our focus was on the OAI DC Standard, while now, if available, we choose the RIOXX protocol.


    EM) When we download the RIOXX metadata we process XML file and store the necessary metadata to index the record. We do that in the Extract Metadata task (hand gesture).

    RC) Further down the pipeline, at the RIOXX compliance task (hand gesture) we check the compliance of the records, following the RIOXX guidelines.
    Currently we have a service that implements a simple validation of the XML records through a schema; we check the presence and type of the content, but not the content quality. This validation is only a first step, but provides useful information to the repository managers with regards to compliance levels.





  • MD) This is an overview of the CORE harvesting process. When harvesting, for every data provider, this could mean a repository, we run a set of tasks. Here (hand gesture) you can see some of the steps of the pipeline, and the ones highlighted, the Metadata Download and the RIOXX Compliance tasks, are the ones impacted by the adoption of RIOXX in CORE.
    I am going to say a bit more about the harvesting process: At first we download the metadata, we do this using mainly the OAI-PMH protocol. In the past, our focus was on the OAI DC Standard, while now, if available, we choose the RIOXX protocol.


    EM) When we download the RIOXX metadata we process XML file and store the necessary metadata to index the record. We do that in the Extract Metadata task (hand gesture).

    RC) Further down the pipeline, at the RIOXX compliance task (hand gesture) we check the compliance of the records, following the RIOXX guidelines.
    Currently we have a service that implements a simple validation of the XML records through a schema; we check the presence and type of the content, but not the content quality. This validation is only a first step, but provides useful information to the repository managers with regards to compliance levels.





  • MD) This is an overview of the CORE harvesting process. When harvesting, for every data provider, this could mean a repository, we run a set of tasks. Here (hand gesture) you can see some of the steps of the pipeline, and the ones highlighted, the Metadata Download and the RIOXX Compliance tasks, are the ones impacted by the adoption of RIOXX in CORE.
    I am going to say a bit more about the harvesting process: At first we download the metadata, we do this using mainly the OAI-PMH protocol. In the past, our focus was on the OAI DC Standard, while now, if available, we choose the RIOXX protocol.


    EM) When we download the RIOXX metadata we process XML file and store the necessary metadata to index the record. We do that in the Extract Metadata task (hand gesture).

    RC) Further down the pipeline, at the RIOXX compliance task (hand gesture) we check the compliance of the records, following the RIOXX guidelines.
    Currently we have a service that implements a simple validation of the XML records through a schema; we check the presence and type of the content, but not the content quality. This validation is only a first step, but provides useful information to the repository managers with regards to compliance levels.





  • MD) This is an overview of the CORE harvesting process. When harvesting, for every data provider, this could mean a repository, we run a set of tasks. Here (hand gesture) you can see some of the steps of the pipeline, and the ones highlighted, the Metadata Download and the RIOXX Compliance tasks, are the ones impacted by the adoption of RIOXX in CORE.
    I am going to say a bit more about the harvesting process: At first we download the metadata, we do this using mainly the OAI-PMH protocol. In the past, our focus was on the OAI DC Standard, while now, if available, we choose the RIOXX protocol.


    EM) When we download the RIOXX metadata we process XML file and store the necessary metadata to index the record. We do that in the Extract Metadata task (hand gesture).

    RC) Further down the pipeline, at the RIOXX compliance task (hand gesture) we check the compliance of the records, following the RIOXX guidelines.
    Currently we have a service that implements a simple validation of the XML records through a schema; we check the presence and type of the content, but not the content quality. This validation is only a first step, but provides useful information to the repository managers with regards to compliance levels.





  • This is how the RIOXX compliance section looks like in the dashboard.

    The results in the RIOXX webpage are based on a sample, while our results cover all the records in a repository. For validation purposes, we have compared the results from RIOXX and CORE and have found them to be consistent.


    (and the one in the RIOXX webpage are consistent, we had also some interesting examples where the results where significantly different, for example one repository was 99% compliant for us and only 5% for the RIOXX page. We investigated and we notice that there was only one field causing this huge difference. RIOXX explicitly define a date format, while we check only for a date. So for us the repo was basically fully compliant while RIOXX was marking the repository as not compliant.)

    The cool part or what I like most about it or something like that If you click in the “show/hide messages you can also see a detailed explanation for every record that is not compliant and why.

  • There is room for improvement,

    We are working on implementing a more complete compliance check; we had a fruitful chat with Paul Walk, where he gave us access to the open source code of the RIOXX validation and we are working to implement the rules defined in the github repository in our code to have a more thorough compliance check.

    We are also working on feeding back to RIOXX the aggregated results and expand access to the service to other interested research stakeholders, such as funders.


    In the future the rioxx metadata fields will be indexed and integrated with our API and dataset.


    Thanks a lot, I am going to have a guinnes now

  • There is room for improvement,

    We are working on implementing a more complete compliance check; we had a fruitful chat with Paul Walk, where he gave us access to the open source code of the RIOXX validation and we are working to implement the rules defined in the github repository in our code to have a more thorough compliance check.

    We are also working on feeding back to RIOXX the aggregated results and expand access to the service to other interested research stakeholders, such as funders.


    In the future the rioxx metadata fields will be indexed and integrated with our API and dataset.


    Thanks a lot, I am going to have a guinnes now

  • There is room for improvement,

    We are working on implementing a more complete compliance check; we had a fruitful chat with Paul Walk, where he gave us access to the open source code of the RIOXX validation and we are working to implement the rules defined in the github repository in our code to have a more thorough compliance check.

    We are also working on feeding back to RIOXX the aggregated results and expand access to the service to other interested research stakeholders, such as funders.


    In the future the rioxx metadata fields will be indexed and integrated with our API and dataset.


    Thanks a lot, I am going to have a guinnes now

  • There is room for improvement,

    We are working on implementing a more complete compliance check; we had a fruitful chat with Paul Walk, where he gave us access to the open source code of the RIOXX validation and we are working to implement the rules defined in the github repository in our code to have a more thorough compliance check.

    We are also working on feeding back to RIOXX the aggregated results and expand access to the service to other interested research stakeholders, such as funders.


    In the future the rioxx metadata fields will be indexed and integrated with our API and dataset.


    Thanks a lot, I am going to have a guinnes now

  • There is room for improvement,

    We are working on implementing a more complete compliance check; we had a fruitful chat with Paul Walk, where he gave us access to the open source code of the RIOXX validation and we are working to implement the rules defined in the github repository in our code to have a more thorough compliance check.

    We are also working on feeding back to RIOXX the aggregated results and expand access to the service to other interested research stakeholders, such as funders.


    In the future the rioxx metadata fields will be indexed and integrated with our API and dataset.


    Thanks a lot, I am going to have a guinnes now

  • ×