Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Interoperability is the key: repositories networks promoting the quality and interoperability of repository metadata & vocabularies


Published on

Presentation from José Carvalho and Pedro Principe, University of Minho, at ETD 2019 Conference (22nd International Symposium on Electronic Theses and Dissertations), Porto, Nov 7, 2019.

Published in: Science
  • Be the first to comment

Interoperability is the key: repositories networks promoting the quality and interoperability of repository metadata & vocabularies

  1. 1. INTEROPERABILITY IS THE KEY repositories networks promoting the quality and interoperability of repository metadata & vocabularies José Carvalho e Pedro Príncipe jcarvalho@sdum.uminho
  2. 2. AGENDA  Power of infrastructures and Repositories Networks - intro  RCAAP and OpenAIRE - overview  Feedback from the participants  Theses and Dissertations @ RCAAP and Portuguese use case  Theses and Dissertations @OpenAIRE PAUSE  Use cases and demos  TID workflow in RCAAP & Broker service from OpenAIRE  Tips for interoperability and trends  Discussions
  4. 4. Infrastructures supporting Open Science Open Science Policy and mandates Infrastructure Development
  5. 5. The RCAAP Project
  6. 6. Project Goals - Increase the visibility, accessibility and dissemination of Portuguese research results - Facilitate access to information about Portuguese scientific output - Integrate Portugal in the wide range of international initiatives in this domain 8
  7. 7. Services 9
  8. 8. National Harvester 1’996’023 Documents indexed from 235 Resources 121 Repositories 111 Journals + La Referencia Portal Harvested Search Portal - Project Website - 10
  9. 9. Thesis in Portugal The management, workflow and interoperability
  10. 10. The Workflow 04 03 02 01 Validation on RENATES Renates system checks for the TID on the Portal and the URL for the item on the repository. Harvesting on Search Portal The Search Portal harvest the thesis and made them available on the API Registration on RENATES Registration on the national registry of thesis by institutions. Attribution of TID. Deposit on Repository Institutions deposit on Repository integrated in the RCAAP Network
  11. 11. What is RENATES Administrative registry of thesis made by the institutions for the government.
  12. 12. 1- Registration on RENATES - Institutions register on a national service the thesis and obtain an identifier TID = Thesis Identifier - After the conclusion and acceptance of the work, the deposit is made by the institution.
  13. 13. 2 - Deposit on Repository Institutions deposit on a repository integrated on the RCAAP Network and associate the TID (Thesis ID) The repository needs to be on the Search Portal Directory:
  14. 14. 2 - Deposit on Repository Submission process with specific forms and a new field : dc.identifier.tid
  15. 15. 2 - Deposit on Repository DSpace functionality - Get metadata from RENATES based on TID
  16. 16. 2 - OAI-PMH The repository exposes the TID
  17. 17. 3 - Harvesting on Search Portal The national search portal RCAAP harvest every night all repositories and made available the information of the thesis on the API. API Interface: positories=true&identifier=201816962
  18. 18. 4 - Validation on RENATES Every night, the RENATES service checks the API for the existence of any thesis with a particular TID not yet identified (or closed). When a TID is available on the portal, RENATES sistems get the URI (usually a handle) and closes the process.
  19. 19. To be DONE 1 - RENATES service get all available metadata and register a DOI for that thesis that points to the repository item. The TID is converted as a DOI from the beginning. Initially not active. RCAAP has a DOI registry at national level. 2 - Digital preservation centralized
  20. 20. Main Aspects - The TID is key to identify the thesis on the ecosystem. - ORCID identifiers are used on RENATES systems and also in some repositories - A persistent identifier is usually associated with the repository. - Registration of DOIs for each thesis in on the way! (with ORCIDs) - Funding information can be associated with thesis! - Digital Preservation is done at repository level (may be centralized in the future) - Metadata (based on OpenAIRE Guidelines v3) & ETD-MS (in a near future OpenAIRE 4)
  21. 21. Legislation ● DL 115/2013 - digital legal deposit for thesis ( ● 285/2015 - practical aspects. ( ● FCT nº 14167/2015 - File Formats (
  22. 22. Compliance Rates
  23. 23. Data Curation for Thesis Work based on repositories to identify - Thesis without TID - Works not deposited Sended lists to institutions to comply. Mandatory from August 2013!
  24. 24. Support of the Community Helpdesk by email and phone (provided by RCAAP Project) Webinars eLearning contents Workshops at events
  25. 25. Validator RCAAP
  26. 26. Specific Profile for Thesis
  27. 27. Resource Profile Stats
  28. 28. Analysis of the Workflow
  29. 29. Building the OpenAIRE Research Graph (information space) OpenAIRE Graph & Dashboards Research Graph Services TERMS OF USE Publications repositories Data repositories Hybrid repositories Registries OA Journals Software repositories Content Providers Research Infras GUIDE LINES
  30. 30.
  31. 31. Doctoral thesis (1,601,144) Master thesis (1,184,953) Bachelor thesis (549,987)
  32. 32. Providing an open metadata research graph of interlinked scientific products, with Open Access information, linked to funding information and research communities The OpenAIRE research graph Open Complete De-duplicated Transparent Participatory Decentralized Trusted
  33. 33. • Publications • Products with “equivalent” PIDs, title, authors, dates are grouped • Dataset • Products with “equivalent” PIDs are grouped De-duplicated • Software Products with “equivalent” PIDs and original URLs are grouped • Other products Products with “equivalent” PIDs, title, authors, dates are grouped
  34. 34. Thesis and Funding Information in OpenAIRE 37
  35. 35. Thesis and Funding Information in OpenAIRE 38
  36. 36. Complete Fine-grained classification of Research Products Publications • Article • Preprint • Report • … Datasets • Dataset • Collection • Clinical Trials • … Software • Research Software • … Other Research Products • Service • Workflow • Interactive Resource • … Institutional/ publication repositories Journals/ publishers Data repositories Other Products repositories Software repositories
  37. 37. OpenAIRE‘s Guidelines for Open Science Content Providers
  38. 38. Evolution of OpenAIRE-Guidelines 2010 Literature Guidelines v1 2012 - Literature Guidelines v2 - Data Guidelines v1 2013 Literature Guidelines v3 2014 Data Guidelines v2 2015 CRIS-CERIF Guidelines v1 2018 Guidelines for - institutional and thematicrepos. v4.0 -CRIS-CERIF v1.1 2018 Guidelines for - Software Repositories - Other Research Products
  39. 39. Metadata Goals in OpenAIRE Goal Metadata Groups Discovery and Citability Descriptive metadata Accessibility and Reuse Access Rights, License Conditions Contextualization Research Project, Linked Research Artefacts Interoperability Identifier for Entities, Controlled Vocabularies Reporting Funding Reference TDM File Location, License Conditions
  40. 40. Role of PIDs in OpenAIRE
  41. 41. 44 Application Profile
  42. 42. Discovery and Citability
  43. 43. Discovery and Citability
  44. 44. Accessibility and Reuse
  45. 45. Contextualization
  46. 46. Interoperability
  47. 47. REPORTS
  48. 48. Metadata Quality Challenges Issue Affects Proposed Solutions Missing values Indexing, discovery, reuse Curation by repository team; use OpenAIRE Validator, Broker service Missing Links and Identifier Interlinking with other research products; Contextualisation ScholXplorer, Broker service Lack of controlled values Discovery Use agreed controlled vocabularies according to OpenAIRE Guidelines Mandatory values only Discovery and reuse Broker service
  49. 49. ● Guidelines at https://openaire-guidelines-for-literature-repository- ● Schema and examples on github References
  50. 50. @openaire_eu OpenAIRE content providers services
  51. 51. OpenAIRE’s e-infrastructure Commons – BROKER CONCEPT Publications repositories Research Data repositories CRIS systems Registries (e.g. projects) OA Journals Software Repositories Validation Cleaning De-duplication Enrichment By inference CONTENT PROVIDERS INFO SPACE SERVICES Project initiative FunderFunding Result Publication Data Software Organization GUIDE LINES TERMS OF USE Repositories in OpenAIRE may be interested to acquire metadata information about publications that are “potentially of interest to them” i.e. be part of their collection: add new records, enrich the records with extra metadata information.
  52. 52. OpenAIREContentProviderDashboard–whatitis 55 One-stop-shop web service where content providers (repositories, data archives, journals, aggregators, CRIS systems) interact with OpenAIRE. It provides the front-end access to many of OpenAIRE's backend services.
  53. 53. Dashboard usage TARGET USERS Repositorymanagers(literature, data),libraries,contentproviders, publishers,nationalaggregators. USER BASE 800contentprovidershaveused theregistrationandvalidation service(V1focusedonliterature repositories,1200alltypesforV2). USER VALUE Improverepositorycollectionsandcontentforenhancedvisibility andaccess.Improvedinstitutionmemory. Betterinstitution researchassessment. Compliancetofunderrules.Improved repository interoperability.
  54. 54. OpenAIRE Content Provider Dashboard what it does
  55. 55. Content Provider Dashboard
  56. 56. Interoperable metadata is key for effective content sharing Use our validation service and see how you can apply the OpenAIRE Guidelines to expose your contents using global standards. VALIDATE
  57. 57. Reach a wider audience around the world Register your datasource in OpenAIRE and be part of a global interlinked network. REGISTER
  58. 58. Improve your metadata. Get more connections OA Broker service offers a wealth of information on scholarly communication data.ENRICH Find out what interests you and subscribe to enrich your records. More & Missing events that may enrich your Repository: • Persistent identifiers • Open Access Versions • Projects • Subjects • Abstracts … datasets, software
  59. 59. Open research impact empowers Open Science Open Metrics service by sharing your usage data. Get the benefit of an aggregated environment to broaden the mechanisms for impact assessment. MEASURE Get usage statistics reports for your datasource
  60. 60.
  61. 61. Support materials for Content Providers Dashboard uptake • Provide - How to validate and register your repository • Provide - How to enrich research artifacts • Usage Statistics – How to track the usage activity of your repository • ScholExplorer - Literature & Data interlinking • Making your repository Open Support – guides • Make your content count - OpenAIRE Content providers Dashboard: service for repository managers • OpenAIRE metrics service: usage statistics • OpenAIRE Guidelines for data providers: new Metadata Application Profile for Literature Repositories Training – webinars
  62. 62.
  63. 63. Tips for Interoperability & Future Trends
  64. 64. Integrated Vision Why? - Develop innovative services based on a national integrated information ecosystem How? - Based on international guidelines, identifiers, tools and services What? - Facilitate integrations between services (repositories, harvesters, funding, government, …) - Focus on the end-user (researcher & science manager) 71
  65. 65. Integrated Vision of the Repositories Network - Focus on community needs (and the support of the community) - Researcher/User Centric approach - Adoption of existing protocols, metadata schemas and guidelines - Focus on Metadata Quality - Get the added value from different services 72
  66. 66. National initiative to ensure the creation and sustained development of national integrated information ecosystem to support research management 73
  67. 67. 74
  68. 68. Identifiers in PT-CRIS People -> Ciência ID; ORCID Organizations -> ISNI Publications -> DOI / Handle Funding -> DOI/IDs Thesis -> TID
  69. 69. 1 76
  70. 70. Integrated Vision Why? - Develop innovative services based on a national integrated information ecosystem How? - Based on international guidelines, identifiers, tools and services What? - Facilitate integrations between services (repositories, harvesters, funding, government, …) - Focus on the end-user (researcher & science manager) 77
  71. 71. Guidelines 78
  72. 72. GUIDELINES - Initially DRIVER Guidelines, then OpenAIRE Guidelines - Working on implementation of OpenAIRE 4 Literature Guidelines - COAR Taxonomies for Document Types, Access Types and Versions 79
  73. 73. IDENTIFIERS ... 80
  74. 74. Adopted Standards 81
  75. 75. Tools 82
  76. 76. NATIONAL HARVESTER - SEARCH PORTAL La Referencia Software - 83
  77. 77. OPENAIRE SERVICES OpenAIRE Content Provider - Broker Service Projects API Interoperability Guidelines OpenAIRE Validator 84
  78. 78. Repository Software - DSpace By now, DSpace 5, but shaping DSpace 7 ! Integrates the concept of Entities Will be OpenAIRE 4 compliant Use of API as main integration endpoint 85
  79. 79. APIs Use of existing information from “authoritative” data sources: - OpenAIRE API project list - Sherpa / Romeo (DSpace) - RENATES - ... 86
  80. 80. OAI-PMH VALIDATOR Multiple validation profiles 87
  81. 81. Validation Reports At Search Portal Level On report based on each harvesting process Shows validated, transformation and errors by type 88
  82. 82. Integrations with other systems - Always… - Using existing metadata profiles / mapping - Adopting existing protocols (OAI-PMH, SWORD, REST API,....) 89
  83. 83. PTCRIS Sync - PTCRIS Sync - Framework for synchronization with ORCID - Curriculum Vitae ORCID API Any other CRIS System 90
  84. 84. Requisites We identified five use cases related to claim tasks, deposits on repositories from external sources, synchronization, authority control for entities (authors, Organizations; funding) and data curation 91
  85. 85. Outputs CLAIM Deposit from external sources Synchroni zation Authority Control Data Curation 2 Deposit 3 Synchronization 1 Claim 4 Authority Control 5 Data Curation 92
  86. 86. Implementation 2 Deposit 3.a Sync RI 2.b Select collection 2.c Repository selection 2.d File upload 2.f Licensing 2.g deposit 3 Synchronization 2.a Authentication 1.a Binding CID a user RI 1.b Claim Author Profile 1.c Work Claim 1 Claim 3.b Sync Portal 4 Authority control i CRIS ii RI iii Portal 5 Data curation I CRIS II RI a Authors b Org Ids c Funding a Authors b Org Ids c Funding III Portal 93
  87. 87. Authentication Federated Authentication based on “National Researcher Identifier” - Ciência ID (that aggregates other author identifiers like ORCID, SCOPUS, Researcher ID,...) a Binding CID a user RI b Claim Author Profile c Claim de trabalhos 1 Claim 94
  88. 88. From a system to the Repository - From departments to institutional repository - From Curriculum Vitae to Repository SWORD V2 API RepositoryUser or System 95
  89. 89. 2. Direct deposit from CV Wizard that permits: • Choose Repository • Choose Collection • Choose FT file and access type • Introduce funding information • Agree with deposit licence 2 Deposit 96
  90. 90. 97
  91. 91. 98
  92. 92. 4. Authority Control for Authors 4 Authority control 4.a Authors Possibility to associate an author name with a unique identifier (ORCID or / and Ciência ID) 99
  93. 93. This feature invokes Ciencia ID or ORCID in order to obtain information about authors 100
  94. 94. From the Repository to Other Systems Repository User on Curriculum Service National Harvester OAI-PMH REST API 101 RENATES
  95. 95. 1.c - Works claim 1.a Binding CID a user RI 1.b Claim Author Profile 1.c Work Claim 1 Claim Possibility of a Ciencia Vitae user to import works from repositories (via RCAAP Portal) 102
  96. 96. Data Curation - Connecting Author Names with ORCID IDs - Converting project IDs into project entity info:eu-repo/grantAgreement/EC/FP7/612425/EU 103
  97. 97. OpenAIRE 4 on repositories Author information is stored in repositories with openAIRE 4 schema and will be harvested in this same format by RCAAP portal. 104
  98. 98. Integrated Vision of the Repositories Network 105
  99. 99. Services Provided by the Repository Infrastructure National Funder- Identify and report publications with national funding for report and evaluation. Thesis & Dissertations- Support the legal deposit of thesis and dissertations Synchronization- Repositories use the national harvester as proxy to other services (Validation, improvement, integration with CV, ORCID, Funder, Thesis & Dissertation legal deposit,...) 106
  100. 100. New Approaches Hierarchical Metadata (Entities Concept) The need to describe specific concepts more deeply (authors, funding, journals, events, affiliations,...) Relations focus on know identifiers Use of ORCID, ISNI, project IDs, ISSN, ISBN, DOIs,... Reproducible Local data model Possibility to reproduce entities and relations to other systems (from the repository to the harvester) 107
  101. 101. New Approaches Concept of a living item - as it may be improved, updated, related over time by third party services (like linking research data to thesis…) Need of international alignment - on guidelines, protocols, data models - to make research management really global 108
  102. 102. Practical Example of the Integrated Vision Curriculum Vitae CRIS System National Harvester Search Portal Institutional Repositories PTCRIS Sync Broker 109 RENATES Thesis
  103. 103. Reproduce Network Model 110
  104. 104. How to reproduce the network ? You have access to… - Services - Software - Guidelines - Protocols - Uses cases - And an open community! What do you need more? 111
  105. 105. to an integrated research Interoperability is Key and added value services!
  106. 106. Thanks! José Carvalho - 113