Implementation of the RIOXX metadata
guidelines in the UK’s repositories
through a harvesting service
Matteo Cancellieri &...
What is CORE
What is CORE
> 170 API users
Facts
> 74 Repositories
Dashboard users
Facts
> 980 Repositories
Facts
> 8,900 Journals
Facts
> 53 Languages
Facts
> 30,058,914 Metadata
Facts
> 2,874,683 Full-text
Facts
Aiming for the moon!
CORE Repositories Dashboard
RIOXX metadata
The RIOXX Metadata Application Profile provides
a mechanism to help institutional repositories
comply with ...
Introducing RIOXX in the CORE pipeline
Introducing RIOXX in the CORE pipeline
Introducing RIOXX in the CORE pipeline
Introducing RIOXX in the CORE pipeline
End Result
Future work
> Complete compliance
check
Future work
> Feed back to RIOXX
aggregated results
Future work
> Give funders and
repository managers the
status of their repository
Future work
> Show RIOXX metadata
on the CORE display
pages and API
Future work
> Comments/suggestions
Thank you!
Matteo Cancellieri, Software Engineer, matteo.cancellieri@open.ac.uk
Nancy Pontika, Open Access Aggregation Off...
Upcoming SlideShare
Loading in …5
×

Implementation of the RIOXX Metadata Guidelines in the UK's repositories through a harvesting service

334 views

Published on

Presented at Open Repositories 2016 conference, Dublin, Ireland, 15th June 2016

Published in: Data & Analytics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
334
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • MD) This is an overview of the CORE harvesting process. When harvesting, for every data provider, this could mean a repository, we run a set of tasks. Here (hand gesture) you can see some of the steps of the pipeline, and the ones highlighted, the Metadata Download and the RIOXX Compliance tasks, are the ones impacted by the adoption of RIOXX in CORE.
    I am going to say a bit more about the harvesting process: At first we download the metadata, we do this using mainly the OAI-PMH protocol. In the past, our focus was on the OAI DC Standard, while now, if available, we choose the RIOXX protocol.


    EM) When we download the RIOXX metadata we process XML file and store the necessary metadata to index the record. We do that in the Extract Metadata task (hand gesture).

    RC) Further down the pipeline, at the RIOXX compliance task (hand gesture) we check the compliance of the records, following the RIOXX guidelines.
    Currently we have a service that implements a simple validation of the XML records through a schema; we check the presence and type of the content, but not the content quality. This validation is only a first step, but provides useful information to the repository managers with regards to compliance levels.





  • MD) This is an overview of the CORE harvesting process. When harvesting, for every data provider, this could mean a repository, we run a set of tasks. Here (hand gesture) you can see some of the steps of the pipeline, and the ones highlighted, the Metadata Download and the RIOXX Compliance tasks, are the ones impacted by the adoption of RIOXX in CORE.
    I am going to say a bit more about the harvesting process: At first we download the metadata, we do this using mainly the OAI-PMH protocol. In the past, our focus was on the OAI DC Standard, while now, if available, we choose the RIOXX protocol.


    EM) When we download the RIOXX metadata we process XML file and store the necessary metadata to index the record. We do that in the Extract Metadata task (hand gesture).

    RC) Further down the pipeline, at the RIOXX compliance task (hand gesture) we check the compliance of the records, following the RIOXX guidelines.
    Currently we have a service that implements a simple validation of the XML records through a schema; we check the presence and type of the content, but not the content quality. This validation is only a first step, but provides useful information to the repository managers with regards to compliance levels.





  • MD) This is an overview of the CORE harvesting process. When harvesting, for every data provider, this could mean a repository, we run a set of tasks. Here (hand gesture) you can see some of the steps of the pipeline, and the ones highlighted, the Metadata Download and the RIOXX Compliance tasks, are the ones impacted by the adoption of RIOXX in CORE.
    I am going to say a bit more about the harvesting process: At first we download the metadata, we do this using mainly the OAI-PMH protocol. In the past, our focus was on the OAI DC Standard, while now, if available, we choose the RIOXX protocol.


    EM) When we download the RIOXX metadata we process XML file and store the necessary metadata to index the record. We do that in the Extract Metadata task (hand gesture).

    RC) Further down the pipeline, at the RIOXX compliance task (hand gesture) we check the compliance of the records, following the RIOXX guidelines.
    Currently we have a service that implements a simple validation of the XML records through a schema; we check the presence and type of the content, but not the content quality. This validation is only a first step, but provides useful information to the repository managers with regards to compliance levels.





  • MD) This is an overview of the CORE harvesting process. When harvesting, for every data provider, this could mean a repository, we run a set of tasks. Here (hand gesture) you can see some of the steps of the pipeline, and the ones highlighted, the Metadata Download and the RIOXX Compliance tasks, are the ones impacted by the adoption of RIOXX in CORE.
    I am going to say a bit more about the harvesting process: At first we download the metadata, we do this using mainly the OAI-PMH protocol. In the past, our focus was on the OAI DC Standard, while now, if available, we choose the RIOXX protocol.


    EM) When we download the RIOXX metadata we process XML file and store the necessary metadata to index the record. We do that in the Extract Metadata task (hand gesture).

    RC) Further down the pipeline, at the RIOXX compliance task (hand gesture) we check the compliance of the records, following the RIOXX guidelines.
    Currently we have a service that implements a simple validation of the XML records through a schema; we check the presence and type of the content, but not the content quality. This validation is only a first step, but provides useful information to the repository managers with regards to compliance levels.





  • This is how the RIOXX compliance section looks like in the dashboard.

    The results in the RIOXX webpage are based on a sample, while our results cover all the records in a repository. For validation purposes, we have compared the results from RIOXX and CORE and have found them to be consistent.


    (and the one in the RIOXX webpage are consistent, we had also some interesting examples where the results where significantly different, for example one repository was 99% compliant for us and only 5% for the RIOXX page. We investigated and we notice that there was only one field causing this huge difference. RIOXX explicitly define a date format, while we check only for a date. So for us the repo was basically fully compliant while RIOXX was marking the repository as not compliant.)

    The cool part or what I like most about it or something like that If you click in the “show/hide messages you can also see a detailed explanation for every record that is not compliant and why.

  • There is room for improvement,

    We are working on implementing a more complete compliance check; we had a fruitful chat with Paul Walk, where he gave us access to the open source code of the RIOXX validation and we are working to implement the rules defined in the github repository in our code to have a more thorough compliance check.

    We are also working on feeding back to RIOXX the aggregated results and expand access to the service to other interested research stakeholders, such as funders.


    In the future the rioxx metadata fields will be indexed and integrated with our API and dataset.


    Thanks a lot, I am going to have a guinnes now

  • There is room for improvement,

    We are working on implementing a more complete compliance check; we had a fruitful chat with Paul Walk, where he gave us access to the open source code of the RIOXX validation and we are working to implement the rules defined in the github repository in our code to have a more thorough compliance check.

    We are also working on feeding back to RIOXX the aggregated results and expand access to the service to other interested research stakeholders, such as funders.


    In the future the rioxx metadata fields will be indexed and integrated with our API and dataset.


    Thanks a lot, I am going to have a guinnes now

  • There is room for improvement,

    We are working on implementing a more complete compliance check; we had a fruitful chat with Paul Walk, where he gave us access to the open source code of the RIOXX validation and we are working to implement the rules defined in the github repository in our code to have a more thorough compliance check.

    We are also working on feeding back to RIOXX the aggregated results and expand access to the service to other interested research stakeholders, such as funders.


    In the future the rioxx metadata fields will be indexed and integrated with our API and dataset.


    Thanks a lot, I am going to have a guinnes now

  • There is room for improvement,

    We are working on implementing a more complete compliance check; we had a fruitful chat with Paul Walk, where he gave us access to the open source code of the RIOXX validation and we are working to implement the rules defined in the github repository in our code to have a more thorough compliance check.

    We are also working on feeding back to RIOXX the aggregated results and expand access to the service to other interested research stakeholders, such as funders.


    In the future the rioxx metadata fields will be indexed and integrated with our API and dataset.


    Thanks a lot, I am going to have a guinnes now

  • There is room for improvement,

    We are working on implementing a more complete compliance check; we had a fruitful chat with Paul Walk, where he gave us access to the open source code of the RIOXX validation and we are working to implement the rules defined in the github repository in our code to have a more thorough compliance check.

    We are also working on feeding back to RIOXX the aggregated results and expand access to the service to other interested research stakeholders, such as funders.


    In the future the rioxx metadata fields will be indexed and integrated with our API and dataset.


    Thanks a lot, I am going to have a guinnes now

  • Implementation of the RIOXX Metadata Guidelines in the UK's repositories through a harvesting service

    1. 1. Implementation of the RIOXX metadata guidelines in the UK’s repositories through a harvesting service Matteo Cancellieri & Nancy Pontika CORE The Open University @oacore
    2. 2. What is CORE
    3. 3. What is CORE
    4. 4. > 170 API users Facts
    5. 5. > 74 Repositories Dashboard users Facts
    6. 6. > 980 Repositories Facts
    7. 7. > 8,900 Journals Facts
    8. 8. > 53 Languages Facts
    9. 9. > 30,058,914 Metadata Facts
    10. 10. > 2,874,683 Full-text Facts
    11. 11. Aiming for the moon!
    12. 12. CORE Repositories Dashboard
    13. 13. RIOXX metadata The RIOXX Metadata Application Profile provides a mechanism to help institutional repositories comply with the RCUK policy on open access. RIOXX focuses on applying consistency to the metadata fields used to record research funder and project/grant identifiers and is designed to support the consistent tracking of open-access research publications across scholarly systems. [Source: http://rioxx.net/]
    14. 14. Introducing RIOXX in the CORE pipeline
    15. 15. Introducing RIOXX in the CORE pipeline
    16. 16. Introducing RIOXX in the CORE pipeline
    17. 17. Introducing RIOXX in the CORE pipeline
    18. 18. End Result
    19. 19. Future work > Complete compliance check
    20. 20. Future work > Feed back to RIOXX aggregated results
    21. 21. Future work > Give funders and repository managers the status of their repository
    22. 22. Future work > Show RIOXX metadata on the CORE display pages and API
    23. 23. Future work > Comments/suggestions
    24. 24. Thank you! Matteo Cancellieri, Software Engineer, matteo.cancellieri@open.ac.uk Nancy Pontika, Open Access Aggregation Officer, nancy.pontika@open.ac.uk Website: http://core.ac.uk Email: theteam@core.ac.uk Twitter: @oacore

    ×