This project has received funding from the European research infrastructures
(including e-Infrastructures) under the European Union's Horizon 2020 research
and innovation programme under grant agreement No 101017501
Research Lifecycle Management technologies for
Earth Science Communities and Copernicus users in EOSC
ARGOS integration in ROHub
Raul Palma
RELIANCE Project Coordinator
Head of Data Analytics and Semantics Department
Poznan Supercomputing and Networking Center (PSNC)
Community Call for ARGOS and Data Management Plans (DMPs)
22th Februrary 2023
• A continuous, iterative and dynamic
process involving the different stages
research data go through before, during,
and after a research project
Motivation: the Research Data Lifecycle
Source: https://www.reading.ac.uk/research-services/research-data-
management/about-research-data-management/the-research-data-lifecycle
Motivation: challenges and vision
Goals
o Support creation and maintenance of machine actionable
o Support data lifecycle stakeholders to actively manage and
related data from a single point
o Enable linking resources related to the data lifecycle (e.g.,
analyze/produce the data, publications, etc.)
o Enable a FAIRness assessment of DMP and related data
o Make DMPs and related resources discoverable in EOSC RG
Challenge
o support the the research data lifecycle mgmt. from the perspective of the various
related stakeholders (e.g., researchers, data providers), who
o define how data will be handled during and after a research project,
o produce/collect this data, process and analyze it, preserve it, and make sure the data is
discoverable, accessible and reusable when appropriate
• Holistic solution for the management of ROs
• storage, lifecycle mgmt. & preservation of scientific outcomes
• share and makes these resources available to others
• publish and release resources through a DOI
• discover and reuse pre-existing scientific knowledge.
• Reference platform
• implements natively the RO-crate model and paradigm
• support different stakeholders, with the primary focus on scientists,
researchers, students and enthusiasts
• provides the backbone to a wealth of RO-centric applications and interfaces
across different scientific communities
ROHub overview
2020+
2010-2013 2014-2019
https://reliance.rohub.org/
Onboarded and
integrated in EOSC
Goal: Account, describe and share everything about your
research, including how those things are related
Research objects
http://www.researchobject.org
Research outcomes and related resources
Each object has its own metadata and repositories
All are first class citizens and are required to make research FAIR
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
Encapsulated content and references to
external resources
Contextualized graph
Research objects: Self-describing, chiefly
metadata, objects
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
Research objects: Self-describing, chiefly
metadata, objects
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
Summary: RO-Crate in a nutshell
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
RO-Crate (Research Object Crate): Practical lightweight approach
to packaging research data entities (any object) with metadata
Aggregate files and/or any URI-addressable content, with
contextual information to aid decisions about re-use: Who What
When Where Why How.
Web Native Machine readable. Human readable. Search engine
friendly. Familiar.
Extensible and Incremental: add additional metadata; nested
and typed by their profile.
Open Community effort
ROHUB enables:
• to create and manage high-quality ROs that can be interpreted and reproduced in the future
• to reference, share and preserve scientific studies, campaigns, and observations related resources, including internal
ones, links to external ones as well as other ROs (nested ROs)
• to collaborate with colleagues and to discover new knowledge via advanced exploratory search interfaces that exploit
RO metadata (both explicitly provided and automatically extracted from its content), as well as via an standard search
well as via an standard search API OpenSearch with Geo extensions
• to manage the RO evolution including the ability to generate snapshots and releases and to allow others to fork the
RO to reuse it and extend it.
• to publish the associated work and assign it a DOI to allow its citation in scholarly communications
• to monitor and follow a particular RO, getting notifications about its progress or quality changes
• researchers to build reputation by enabling users to rate and favorite ROs created by others
• to find related works or researchers in a a domain, e.g., for possible collaborations or reviews
High-level features
ROHub and added value services
Semantic enrichment
readability, discoverability, reuse
Recommendation
content-based, concentric spheres
Research lifecycle & scholarly communication
collaboration, publication, citation, validation
FAIR/Quality assessment
HQ monitoring & preservation
Social Impact
Sharing, quality
Publish
in EOSC
Publish
as PDF
EOSC integration
AAI
ROHub connections with EOSC Core and other
Exchange services
• All RELIANCE services are
onboarded in EOSC marketplace
• RELIANCE services integrates and
rely on different EOSC core and
other EOSC Exchange services
Notebook
Binder
AAI
check-in
EOSC Resource Catalogue
ROHub architecture
Argos Integration
• Argos DMP can be exported in different formats including XML and JSON.
• ROHub can import Argos DMPs in XML (+ JSON)
• The imported DMP generates a research object
Argos Integration
• The imported RO includes
• All the information from the DMP in the form of human (a subset)
and machine-readable metadata (reusing standard vocabularies)
• the datasets themselves (physically or by reference) or, if they
are not created/collected at that point in time, a reserved space in
the research object to upload them when they will be available.
Argos Integration
• New DMP versions can be propagated in the research object, generating new version of the research
object itself:
• A snapshot of the research object is generated (before importing new version)
• The updated DMP replaces all the
metadata in the research object
workflow and implementation
details
Argos Integration – Templates
1. ROHub is able to process an Argos
template ( XML), which extracts and
returns the questions and ids of
answers in JSON.
2. The domain expert maps the answers
to predicates reusing existing
vocabularies, i.e., DMP ontology, Dublin
Core, Schema.org. If no predicate from
those vocabularies is suitable, a new
predicate must be created.
3. The filled mapping in JSON is imported
in ROHub
Admin operations
Argos Integration – DMPs
1. ROHub imports DMP (based on supported
template). This operations expects DMP in
XML. Optionally, it can be provided the
URL of original DMP, and DMP in JSON
(to complement XML)
2. It is possible to retrieve folders that
corresponds to DMP datasets for given
RO
3. It is possible to return all machine
readable metadata for given folder (DMP
dataset)
Argos Integration – DMP upgrades
1. To propagate new version of DMP, it is
neccesary to upgrade the DMP RO,
sending the same information as new
import
2. Considered scenarios:
• New dataset is added
• Existing datasets is being updated
• Existing dataset is being deleted
3. For last case there is a constraint that the
dataset (RO folder) to be deleted can not
contain any data
4. If conditions are met, an RO snapshot is
generated , then all annotations for
original RO and metadata is replaced with
new DMP
Argos Integration – ongoing and future work
1. Propagate changes in ROHub to Argos
2. Export existing RO to Argos
• Export RO to XML
• Import generated XML in Argos
3. Define Argos templates aligned with DMP
requirements at the national level
4. Extend Argos GUI to enable direct export to ROHub
5. Extend ROHub GUI to enable direct export to Argos
6. Provide automatic suggestions for mappings based
on NLP methods
7. Integrate national repositories in ROHub to allow
search/connect existing datasets to DMP or to add
new datasets from DMP
8. Improve/test/evaluate at national level, and potential
integration/alignment with national funding agencies
DMPs
9. Asses possibility of using any Argos API if any ?
Argos Integration – clarification/issues found
• export DMP as XML file:
• funder, grant and project IDs are internal IDs while in json file (json file - DMP
exported as RDA json) they are Zenodo IDs
• datasets descriptions are missing
• contact data is missing (is present in json file)
• there is a problem with profiles (templates) IDs - they do not correspond to profiles
between users
• some data is missing when compared to json file e.g. in DMP based on a template
National Science Center Poland in json file there is:
"commentFieldValue116df9c2-441d-36a3-1560-a5638c7c5778" : "user answer",
which is not present in XML counterpart
• ROHub in EOSC marketplace: https://marketplace.eosc-portal.eu/services/psnc.rohub
• ROHub portal https://reliance.rohub.org/
• ROHub tutorial: https://reliance-eosc.github.io/ROHUB-API_documentation/html/tutorials.html
• ROHub portal documentation: https://reliance-eosc.github.io/rohub-portal-documentation/
• ROHub API library documentation : https://reliance-eosc.github.io/ROHUB-
API_documentation/html/index.html
• ROHub API library example Jupyter Notebooks: https://github.com/RELIANCE-EOSC/sample-
notebooks
• ROHub helpdesk: https://support.pcss.pl/servicedesk/customer/portal/27 or support
email:support@rohub.org
Onboarding and support resources
This project has received funding from the European research infrastructures
(including e-Infrastructures) under the European Union's Horizon 2020 research
and innovation programme under grant agreement No 101017501
Research Lifecycle Management technologies for
Earth Science Communities and Copernicus users in EOSC
Thanks!
Raul Palma
rpalma@man.poznan.pl

ROHub-Argos integration

  • 1.
    This project hasreceived funding from the European research infrastructures (including e-Infrastructures) under the European Union's Horizon 2020 research and innovation programme under grant agreement No 101017501 Research Lifecycle Management technologies for Earth Science Communities and Copernicus users in EOSC ARGOS integration in ROHub Raul Palma RELIANCE Project Coordinator Head of Data Analytics and Semantics Department Poznan Supercomputing and Networking Center (PSNC) Community Call for ARGOS and Data Management Plans (DMPs) 22th Februrary 2023
  • 2.
    • A continuous,iterative and dynamic process involving the different stages research data go through before, during, and after a research project Motivation: the Research Data Lifecycle Source: https://www.reading.ac.uk/research-services/research-data- management/about-research-data-management/the-research-data-lifecycle
  • 3.
    Motivation: challenges andvision Goals o Support creation and maintenance of machine actionable o Support data lifecycle stakeholders to actively manage and related data from a single point o Enable linking resources related to the data lifecycle (e.g., analyze/produce the data, publications, etc.) o Enable a FAIRness assessment of DMP and related data o Make DMPs and related resources discoverable in EOSC RG Challenge o support the the research data lifecycle mgmt. from the perspective of the various related stakeholders (e.g., researchers, data providers), who o define how data will be handled during and after a research project, o produce/collect this data, process and analyze it, preserve it, and make sure the data is discoverable, accessible and reusable when appropriate
  • 4.
    • Holistic solutionfor the management of ROs • storage, lifecycle mgmt. & preservation of scientific outcomes • share and makes these resources available to others • publish and release resources through a DOI • discover and reuse pre-existing scientific knowledge. • Reference platform • implements natively the RO-crate model and paradigm • support different stakeholders, with the primary focus on scientists, researchers, students and enthusiasts • provides the backbone to a wealth of RO-centric applications and interfaces across different scientific communities ROHub overview 2020+ 2010-2013 2014-2019 https://reliance.rohub.org/ Onboarded and integrated in EOSC
  • 5.
    Goal: Account, describeand share everything about your research, including how those things are related Research objects http://www.researchobject.org
  • 6.
    Research outcomes andrelated resources Each object has its own metadata and repositories All are first class citizens and are required to make research FAIR [source RO-Crate: A framework for packaging research products into FAIR Research Objects]
  • 7.
    Encapsulated content andreferences to external resources Contextualized graph
  • 8.
    Research objects: Self-describing,chiefly metadata, objects [source RO-Crate: A framework for packaging research products into FAIR Research Objects]
  • 9.
    Research objects: Self-describing,chiefly metadata, objects [source RO-Crate: A framework for packaging research products into FAIR Research Objects]
  • 10.
    Summary: RO-Crate ina nutshell [source RO-Crate: A framework for packaging research products into FAIR Research Objects] RO-Crate (Research Object Crate): Practical lightweight approach to packaging research data entities (any object) with metadata Aggregate files and/or any URI-addressable content, with contextual information to aid decisions about re-use: Who What When Where Why How. Web Native Machine readable. Human readable. Search engine friendly. Familiar. Extensible and Incremental: add additional metadata; nested and typed by their profile. Open Community effort
  • 12.
    ROHUB enables: • tocreate and manage high-quality ROs that can be interpreted and reproduced in the future • to reference, share and preserve scientific studies, campaigns, and observations related resources, including internal ones, links to external ones as well as other ROs (nested ROs) • to collaborate with colleagues and to discover new knowledge via advanced exploratory search interfaces that exploit RO metadata (both explicitly provided and automatically extracted from its content), as well as via an standard search well as via an standard search API OpenSearch with Geo extensions • to manage the RO evolution including the ability to generate snapshots and releases and to allow others to fork the RO to reuse it and extend it. • to publish the associated work and assign it a DOI to allow its citation in scholarly communications • to monitor and follow a particular RO, getting notifications about its progress or quality changes • researchers to build reputation by enabling users to rate and favorite ROs created by others • to find related works or researchers in a a domain, e.g., for possible collaborations or reviews High-level features
  • 13.
    ROHub and addedvalue services Semantic enrichment readability, discoverability, reuse Recommendation content-based, concentric spheres Research lifecycle & scholarly communication collaboration, publication, citation, validation FAIR/Quality assessment HQ monitoring & preservation Social Impact Sharing, quality Publish in EOSC Publish as PDF EOSC integration AAI
  • 14.
    ROHub connections withEOSC Core and other Exchange services • All RELIANCE services are onboarded in EOSC marketplace • RELIANCE services integrates and rely on different EOSC core and other EOSC Exchange services Notebook Binder AAI check-in EOSC Resource Catalogue
  • 15.
  • 17.
    Argos Integration • ArgosDMP can be exported in different formats including XML and JSON. • ROHub can import Argos DMPs in XML (+ JSON) • The imported DMP generates a research object
  • 18.
    Argos Integration • Theimported RO includes • All the information from the DMP in the form of human (a subset) and machine-readable metadata (reusing standard vocabularies) • the datasets themselves (physically or by reference) or, if they are not created/collected at that point in time, a reserved space in the research object to upload them when they will be available.
  • 19.
    Argos Integration • NewDMP versions can be propagated in the research object, generating new version of the research object itself: • A snapshot of the research object is generated (before importing new version) • The updated DMP replaces all the metadata in the research object
  • 20.
  • 21.
    Argos Integration –Templates 1. ROHub is able to process an Argos template ( XML), which extracts and returns the questions and ids of answers in JSON. 2. The domain expert maps the answers to predicates reusing existing vocabularies, i.e., DMP ontology, Dublin Core, Schema.org. If no predicate from those vocabularies is suitable, a new predicate must be created. 3. The filled mapping in JSON is imported in ROHub Admin operations
  • 22.
    Argos Integration –DMPs 1. ROHub imports DMP (based on supported template). This operations expects DMP in XML. Optionally, it can be provided the URL of original DMP, and DMP in JSON (to complement XML) 2. It is possible to retrieve folders that corresponds to DMP datasets for given RO 3. It is possible to return all machine readable metadata for given folder (DMP dataset)
  • 23.
    Argos Integration –DMP upgrades 1. To propagate new version of DMP, it is neccesary to upgrade the DMP RO, sending the same information as new import 2. Considered scenarios: • New dataset is added • Existing datasets is being updated • Existing dataset is being deleted 3. For last case there is a constraint that the dataset (RO folder) to be deleted can not contain any data 4. If conditions are met, an RO snapshot is generated , then all annotations for original RO and metadata is replaced with new DMP
  • 24.
    Argos Integration –ongoing and future work 1. Propagate changes in ROHub to Argos 2. Export existing RO to Argos • Export RO to XML • Import generated XML in Argos 3. Define Argos templates aligned with DMP requirements at the national level 4. Extend Argos GUI to enable direct export to ROHub 5. Extend ROHub GUI to enable direct export to Argos 6. Provide automatic suggestions for mappings based on NLP methods 7. Integrate national repositories in ROHub to allow search/connect existing datasets to DMP or to add new datasets from DMP 8. Improve/test/evaluate at national level, and potential integration/alignment with national funding agencies DMPs 9. Asses possibility of using any Argos API if any ?
  • 25.
    Argos Integration –clarification/issues found • export DMP as XML file: • funder, grant and project IDs are internal IDs while in json file (json file - DMP exported as RDA json) they are Zenodo IDs • datasets descriptions are missing • contact data is missing (is present in json file) • there is a problem with profiles (templates) IDs - they do not correspond to profiles between users • some data is missing when compared to json file e.g. in DMP based on a template National Science Center Poland in json file there is: "commentFieldValue116df9c2-441d-36a3-1560-a5638c7c5778" : "user answer", which is not present in XML counterpart
  • 26.
    • ROHub inEOSC marketplace: https://marketplace.eosc-portal.eu/services/psnc.rohub • ROHub portal https://reliance.rohub.org/ • ROHub tutorial: https://reliance-eosc.github.io/ROHUB-API_documentation/html/tutorials.html • ROHub portal documentation: https://reliance-eosc.github.io/rohub-portal-documentation/ • ROHub API library documentation : https://reliance-eosc.github.io/ROHUB- API_documentation/html/index.html • ROHub API library example Jupyter Notebooks: https://github.com/RELIANCE-EOSC/sample- notebooks • ROHub helpdesk: https://support.pcss.pl/servicedesk/customer/portal/27 or support email:support@rohub.org Onboarding and support resources
  • 27.
    This project hasreceived funding from the European research infrastructures (including e-Infrastructures) under the European Union's Horizon 2020 research and innovation programme under grant agreement No 101017501 Research Lifecycle Management technologies for Earth Science Communities and Copernicus users in EOSC Thanks! Raul Palma rpalma@man.poznan.pl

Editor's Notes

  • #3 [The “Earth Science Research and Information Lifecycle” can be defined as the continuous, iterative and on-going process used by scientists for conducting, validating and disseminating scientific knowledge. It can undergo an unlimited number of iterations which lead to the development of new and innovative ideas, concepts, techniques and technologies which ultimately benefit both science and society. The life cycle can be summarized into four main phases that include different categories of stakeholder : Scientists access information (e.g. raw data or added value products generated by colleagues) and share results; this is reliant on researchers and data providers giving access to the data and related knowledge; Shared results and information are analysed, interpretative models are generated and discussed with other colleagues (within the team and/or the wider community, which can include external stakeholders), and may require the use of visualisation tools and data analytics; Discussion leads to novel ideas and concepts which might need validation through further experimentation or data acquisition; requires access to additional data sets held by other data providers; New results are validated and shared (together with the workflow and processes used to generate them) for further discussion; including dissemination to external stakeholders (e.g. general public, policy makers,). This provides stimulus to new research bringing the process back to step 1. ]
  • #4 updating them as needs evolve during the research project incl. information of the underlying context & relations between resources,
  • #5 e.g., scientific investigations, campaigns and operational processes including latest RO-crate specification
  • #6 incl. information of the underlying context & relations between resources,
  • #7 incl. information of the underlying context & relations between resources,
  • #8 incl. information of the underlying context & relations between resources,
  • #9 incl. information of the underlying context & relations between resources,
  • #10 incl. information of the underlying context & relations between resources,
  • #11 incl. information of the underlying context & relations between resources,
  • #12 exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks Comprises backend service exposing a set of APIs IAM component integrated with EOSC AAI reference web client application Python library EOSC integration (AAI, publishing, storage) External RO added value services Semantic enrichment & recommendation Checklist evaluation Quality monitoring PDF generation Data cubes
  • #16 exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks Comprises backend service exposing a set of APIs IAM component integrated with EOSC AAI reference web client application Python library EOSC integration (AAI, publishing, storage) External RO added value services Semantic enrichment & recommendation Checklist evaluation Quality monitoring PDF generation Data cubes
  • #17 exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks Comprises backend service exposing a set of APIs IAM component integrated with EOSC AAI reference web client application Python library EOSC integration (AAI, publishing, storage) External RO added value services Semantic enrichment & recommendation Checklist evaluation Quality monitoring PDF generation Data cubes
  • #18 exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks Comprises backend service exposing a set of APIs IAM component integrated with EOSC AAI reference web client application Python library EOSC integration (AAI, publishing, storage) External RO added value services Semantic enrichment & recommendation Checklist evaluation Quality monitoring PDF generation Data cubes
  • #19 exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks Comprises backend service exposing a set of APIs IAM component integrated with EOSC AAI reference web client application Python library EOSC integration (AAI, publishing, storage) External RO added value services Semantic enrichment & recommendation Checklist evaluation Quality monitoring PDF generation Data cubes
  • #20 exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks Comprises backend service exposing a set of APIs IAM component integrated with EOSC AAI reference web client application Python library EOSC integration (AAI, publishing, storage) External RO added value services Semantic enrichment & recommendation Checklist evaluation Quality monitoring PDF generation Data cubes
  • #21 exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks Comprises backend service exposing a set of APIs IAM component integrated with EOSC AAI reference web client application Python library EOSC integration (AAI, publishing, storage) External RO added value services Semantic enrichment & recommendation Checklist evaluation Quality monitoring PDF generation Data cubes
  • #22 exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks Comprises backend service exposing a set of APIs IAM component integrated with EOSC AAI reference web client application Python library EOSC integration (AAI, publishing, storage) External RO added value services Semantic enrichment & recommendation Checklist evaluation Quality monitoring PDF generation Data cubes
  • #23 exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks Comprises backend service exposing a set of APIs IAM component integrated with EOSC AAI reference web client application Python library EOSC integration (AAI, publishing, storage) External RO added value services Semantic enrichment & recommendation Checklist evaluation Quality monitoring PDF generation Data cubes
  • #24 exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks Comprises backend service exposing a set of APIs IAM component integrated with EOSC AAI reference web client application Python library EOSC integration (AAI, publishing, storage) External RO added value services Semantic enrichment & recommendation Checklist evaluation Quality monitoring PDF generation Data cubes
  • #25 exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks Comprises backend service exposing a set of APIs IAM component integrated with EOSC AAI reference web client application Python library EOSC integration (AAI, publishing, storage) External RO added value services Semantic enrichment & recommendation Checklist evaluation Quality monitoring PDF generation Data cubes
  • #26 exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks Comprises backend service exposing a set of APIs IAM component integrated with EOSC AAI reference web client application Python library EOSC integration (AAI, publishing, storage) External RO added value services Semantic enrichment & recommendation Checklist evaluation Quality monitoring PDF generation Data cubes