This project has received funding from the European research infrastructures
(including e-Infrastructures) under the European Union's Horizon 2020 research
and innovation programme under grant agreement No 101017501
Research Lifecycle Management technologies for
Earth Science Communities and Copernicus users in EOSC
FAIR digital objects as a building block for
digitalization technology stacks
Raul Palma
RELIANCE Project Coordinator
Head of Data Analytics and Semantics Department
Poznan Supercomputing and Networking Center (PSNC)
4th EMMC International Workshop 2023
27th April 2023
A continuous, iterative and dynamic process followed by scientists for conducting, validating and
disseminating scientific knowledge
The research and Information Lifecycle
Motivation for FAIR digital objects & Open Science
A. Fouilloux, et. Al. FAIR Research Objects for realising Open Science with RELIANCE EOSC Project
Need: mechanisms to manage data, methods and other resources which could: i) enhance visibility of scientific
breakthroughs; ii) encourage reuse, and iii) foster a broader research accessibility
Goal: Account, describe and share everything about your
research, including how those things are related
Research objects
http://www.researchobject.org
Research outcomes and related resources
Each object has its own metadata and repositories
All are first class citizens and are required to make research FAIR
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
Encapsulated content and references to
external resources
Contextualized graph
Research objects: Self-describing, chiefly
metadata, objects
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
Research objects: Self-describing, chiefly
metadata, objects
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
From FAIR data to FAIR Digital Objects
C. Goble, S. Soiland-Reyes. RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-crate as FAIR Digital Object
C. Goble, S. Soiland-Reyes. RO-Crate: A framework for packaging
research products into FAIR Research Objects
Files: physical/links
RO: linked data/ZIP
implementation of FAIR Digital Objects using RO Crate
© Poznan Supercomputing and Networking Center
How do we describe the metadata?
• PIDs + JSON-LD + Schema.org descriptors
• Opinionated profile of schema.org
• Linked Data by Stealth: JSON with gradual path
to extensibility with LD – e.g. ad-hoc terms
• Example-driven documentation
How can I add additional metadata?
• Schema.org, domain ontologies
How do I define a checklist of what is
expected to be in a type of RO?
• RO-Crate Profiles
http://www.researchobject.org/ro-crate/
Strict Structure, Open ended content
Self-describing, chiefly metadata, objects
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
Summary: RO-Crate in a nutshell
[source RO-Crate: A framework for packaging research products into
FAIR Research Objects]
RO-Crate (Research Object Crate): Practical lightweight approach
to packaging research data entities (any object) with metadata
Aggregate files and/or any URI-addressable content, with
contextual information to aid decisions about re-use: Who What
When Where Why How.
Web Native Machine readable. Human readable. Search engine
friendly. Familiar.
Extensible and Incremental: add additional metadata; nested
and typed by their profile.
Open Community effort
• Holistic solution for the management of Research
Objects
• Reference platform
• implements natively the RO-crate model and paradigm
• support different stakeholders, with the primary focus on scientists,
researchers, students and enthusiasts
• provides the backbone to a wealth of RO-centric applications and
interfaces across different scientific communities
ROHub overview
2020+
2010-2013 2014-2019
https://reliance.rohub.org/
Onboarded and
integrated in EOSC
ROHUB enables:
• to create and manage high-quality ROs that can be interpreted and reproduced in the future
• to reference, share and preserve scientific studies, campaigns, and observations related resources, including internal
ones, links to external ones as well as other ROs (nested ROs)
• to collaborate with colleagues and to discover new knowledge via advanced exploratory search interfaces that exploit RO
metadata (both explicitly provided and automatically extracted from its content), as well as via an standard search API
via an standard search API OpenSearch with Geo extensions
• to manage the RO evolution including the ability to generate snapshots and releases and to allow others to fork the RO
to reuse it and extend it.
• to publish the associated work and assign it a DOI to allow its citation in scholarly communications
• to monitor and follow a particular RO, getting notifications about its progress or quality changes
• researchers to build reputation by enabling users to rate and favorite ROs created by others
• to find related works or researchers in a a domain, e.g., for possible collaborations or reviews
High-level features
ROHub and added value services
Semantic enrichment
readability, discoverability, reuse
Recommendation
content-based, concentric spheres
Research lifecycle & scholarly communication
collaboration, publication, citation, validation
Completeness assessment
HQ monitoring & preservation
Impact
Sharing, rating
Publish RO-crate
in EOSC
Publish
as PDF
FAIR assessment
RO and components level
ROHub connections with EOSC Core and other
Exchange services
Notebook
Binder
AAI
check-in
EOSC Resource Catalogue
• Anne, a scientist from Oslo and her team
want to perform a climate change research
under the atmospheric perspective.
• Anne goes to EOSC resource catalogue where
she searches for existing results and finds an
executable RO
• She opens the RO and finds a Jupyter
notebook that was used to analyze the data.
• Anne clicks on the notebook and it is opened in EGI
notebooks. She uses Data Cubes to exploit EO data
provided by the Copernicus program, and saves
results in EGI DataHub
• Anne creates her own RO (forking the reused RO)
and starts to work on it, i.e., aggregates the new
notebook, the data cube, and other resources.The
new RO will also appear in EOSC explore
• Anne invites colleagues to contribute; the shared RO
will keep track of the provenance of contributions.
They can be notified when the RO is modified.
• Before publishing, they make a self-
assessment of the FAIR-ness of their
research, and check the quality of the RO
• Once RO is ready, Anne makes a snapshot and
publish it in Zenodo with a DOI.This new RO will
then appear also in EOSC catalogue
Find research work, access and reproduce it, reuse it
in new research, collaborate, assess quality and
publish it leveraging different EOSC services
check-in
EOSC services in support of OS
EOSC architecture overview
365 Resources onboarded in EOSC as of end
of October 2022 (source EOSC Metrics Portal)
EOSC
Exchange
EOSC
Core
EOSC
Interoperability
Framework
Execution
Framework
EOSC
Support
Activities
Community
A
Thematic
Resources
Thematic
service
Thematic
service
Thematic
resources
Core coordination functions
Core technical platform
Regional D
Regional
Resources
Thematic
service
Thematic
service
Regional
resources
Cluster B
Thematic
Resources
Thematic
service
Thematic
service
Thematic
resources
Community
C
Thematic
Resources
Thematic
service
Thematic
service
Thematic
resources
Horizontal services
Regional clusters
Thematic clusters
Cluster services
onboarded
eInfrastructures
INFRAEOSC-07
EOSC provisioning / expansion
Horizontal services
and resources from
07 projects and e-
infras onboarded
EOSC Horizontal services
are delivered (e.g. data,
compute and other
research enabling services)
Ability to create thematic
execution environments /
VREs based on integration of
compliant thematic, horizontal,
and core resources
This project has received funding from the European research infrastructures
(including e-Infrastructures) under the European Union's Horizon 2020 research
and innovation programme under grant agreement No 101017501
Research Lifecycle Management technologies for
Earth Science Communities and Copernicus users in EOSC
Thanks!
Raul Palma
rpalma@man.poznan.pl
• ROHub in EOSC marketplace: https://marketplace.eosc-portal.eu/services/psnc.rohub
• ROHub portal https://reliance.rohub.org/
• ROHub tutorial: https://reliance-eosc.github.io/ROHUB-API_documentation/html/tutorials.html
• ROHub portal documentation: https://reliance-eosc.github.io/rohub-portal-documentation/
• ROHub API library documentation : https://reliance-eosc.github.io/ROHUB-
API_documentation/html/index.html
• ROHub API library example Jupyter Notebooks: https://github.com/RELIANCE-EOSC/sample-
notebooks
• ROHub helpdesk: https://support.pcss.pl/servicedesk/customer/portal/27 or support
email:support@rohub.org
Onboarding and support resources

FDO as building block for digitization technology stacks

  • 1.
    This project hasreceived funding from the European research infrastructures (including e-Infrastructures) under the European Union's Horizon 2020 research and innovation programme under grant agreement No 101017501 Research Lifecycle Management technologies for Earth Science Communities and Copernicus users in EOSC FAIR digital objects as a building block for digitalization technology stacks Raul Palma RELIANCE Project Coordinator Head of Data Analytics and Semantics Department Poznan Supercomputing and Networking Center (PSNC) 4th EMMC International Workshop 2023 27th April 2023
  • 2.
    A continuous, iterativeand dynamic process followed by scientists for conducting, validating and disseminating scientific knowledge The research and Information Lifecycle
  • 3.
    Motivation for FAIRdigital objects & Open Science A. Fouilloux, et. Al. FAIR Research Objects for realising Open Science with RELIANCE EOSC Project Need: mechanisms to manage data, methods and other resources which could: i) enhance visibility of scientific breakthroughs; ii) encourage reuse, and iii) foster a broader research accessibility
  • 4.
    Goal: Account, describeand share everything about your research, including how those things are related Research objects http://www.researchobject.org
  • 5.
    Research outcomes andrelated resources Each object has its own metadata and repositories All are first class citizens and are required to make research FAIR [source RO-Crate: A framework for packaging research products into FAIR Research Objects]
  • 6.
    Encapsulated content andreferences to external resources Contextualized graph
  • 7.
    Research objects: Self-describing,chiefly metadata, objects [source RO-Crate: A framework for packaging research products into FAIR Research Objects]
  • 8.
    Research objects: Self-describing,chiefly metadata, objects [source RO-Crate: A framework for packaging research products into FAIR Research Objects]
  • 9.
    From FAIR datato FAIR Digital Objects C. Goble, S. Soiland-Reyes. RO-Crate: A framework for packaging research products into FAIR Research Objects
  • 10.
    RO-crate as FAIRDigital Object C. Goble, S. Soiland-Reyes. RO-Crate: A framework for packaging research products into FAIR Research Objects Files: physical/links RO: linked data/ZIP implementation of FAIR Digital Objects using RO Crate
  • 11.
    © Poznan Supercomputingand Networking Center How do we describe the metadata? • PIDs + JSON-LD + Schema.org descriptors • Opinionated profile of schema.org • Linked Data by Stealth: JSON with gradual path to extensibility with LD – e.g. ad-hoc terms • Example-driven documentation How can I add additional metadata? • Schema.org, domain ontologies How do I define a checklist of what is expected to be in a type of RO? • RO-Crate Profiles http://www.researchobject.org/ro-crate/ Strict Structure, Open ended content Self-describing, chiefly metadata, objects [source RO-Crate: A framework for packaging research products into FAIR Research Objects]
  • 12.
    Summary: RO-Crate ina nutshell [source RO-Crate: A framework for packaging research products into FAIR Research Objects] RO-Crate (Research Object Crate): Practical lightweight approach to packaging research data entities (any object) with metadata Aggregate files and/or any URI-addressable content, with contextual information to aid decisions about re-use: Who What When Where Why How. Web Native Machine readable. Human readable. Search engine friendly. Familiar. Extensible and Incremental: add additional metadata; nested and typed by their profile. Open Community effort
  • 13.
    • Holistic solutionfor the management of Research Objects • Reference platform • implements natively the RO-crate model and paradigm • support different stakeholders, with the primary focus on scientists, researchers, students and enthusiasts • provides the backbone to a wealth of RO-centric applications and interfaces across different scientific communities ROHub overview 2020+ 2010-2013 2014-2019 https://reliance.rohub.org/ Onboarded and integrated in EOSC
  • 14.
    ROHUB enables: • tocreate and manage high-quality ROs that can be interpreted and reproduced in the future • to reference, share and preserve scientific studies, campaigns, and observations related resources, including internal ones, links to external ones as well as other ROs (nested ROs) • to collaborate with colleagues and to discover new knowledge via advanced exploratory search interfaces that exploit RO metadata (both explicitly provided and automatically extracted from its content), as well as via an standard search API via an standard search API OpenSearch with Geo extensions • to manage the RO evolution including the ability to generate snapshots and releases and to allow others to fork the RO to reuse it and extend it. • to publish the associated work and assign it a DOI to allow its citation in scholarly communications • to monitor and follow a particular RO, getting notifications about its progress or quality changes • researchers to build reputation by enabling users to rate and favorite ROs created by others • to find related works or researchers in a a domain, e.g., for possible collaborations or reviews High-level features
  • 15.
    ROHub and addedvalue services Semantic enrichment readability, discoverability, reuse Recommendation content-based, concentric spheres Research lifecycle & scholarly communication collaboration, publication, citation, validation Completeness assessment HQ monitoring & preservation Impact Sharing, rating Publish RO-crate in EOSC Publish as PDF FAIR assessment RO and components level
  • 16.
    ROHub connections withEOSC Core and other Exchange services Notebook Binder AAI check-in EOSC Resource Catalogue
  • 17.
    • Anne, ascientist from Oslo and her team want to perform a climate change research under the atmospheric perspective. • Anne goes to EOSC resource catalogue where she searches for existing results and finds an executable RO • She opens the RO and finds a Jupyter notebook that was used to analyze the data. • Anne clicks on the notebook and it is opened in EGI notebooks. She uses Data Cubes to exploit EO data provided by the Copernicus program, and saves results in EGI DataHub • Anne creates her own RO (forking the reused RO) and starts to work on it, i.e., aggregates the new notebook, the data cube, and other resources.The new RO will also appear in EOSC explore • Anne invites colleagues to contribute; the shared RO will keep track of the provenance of contributions. They can be notified when the RO is modified. • Before publishing, they make a self- assessment of the FAIR-ness of their research, and check the quality of the RO • Once RO is ready, Anne makes a snapshot and publish it in Zenodo with a DOI.This new RO will then appear also in EOSC catalogue Find research work, access and reproduce it, reuse it in new research, collaborate, assess quality and publish it leveraging different EOSC services check-in EOSC services in support of OS
  • 18.
    EOSC architecture overview 365Resources onboarded in EOSC as of end of October 2022 (source EOSC Metrics Portal) EOSC Exchange EOSC Core EOSC Interoperability Framework Execution Framework EOSC Support Activities Community A Thematic Resources Thematic service Thematic service Thematic resources Core coordination functions Core technical platform Regional D Regional Resources Thematic service Thematic service Regional resources Cluster B Thematic Resources Thematic service Thematic service Thematic resources Community C Thematic Resources Thematic service Thematic service Thematic resources Horizontal services Regional clusters Thematic clusters Cluster services onboarded eInfrastructures INFRAEOSC-07 EOSC provisioning / expansion Horizontal services and resources from 07 projects and e- infras onboarded EOSC Horizontal services are delivered (e.g. data, compute and other research enabling services) Ability to create thematic execution environments / VREs based on integration of compliant thematic, horizontal, and core resources
  • 19.
    This project hasreceived funding from the European research infrastructures (including e-Infrastructures) under the European Union's Horizon 2020 research and innovation programme under grant agreement No 101017501 Research Lifecycle Management technologies for Earth Science Communities and Copernicus users in EOSC Thanks! Raul Palma rpalma@man.poznan.pl
  • 20.
    • ROHub inEOSC marketplace: https://marketplace.eosc-portal.eu/services/psnc.rohub • ROHub portal https://reliance.rohub.org/ • ROHub tutorial: https://reliance-eosc.github.io/ROHUB-API_documentation/html/tutorials.html • ROHub portal documentation: https://reliance-eosc.github.io/rohub-portal-documentation/ • ROHub API library documentation : https://reliance-eosc.github.io/ROHUB- API_documentation/html/index.html • ROHub API library example Jupyter Notebooks: https://github.com/RELIANCE-EOSC/sample- notebooks • ROHub helpdesk: https://support.pcss.pl/servicedesk/customer/portal/27 or support email:support@rohub.org Onboarding and support resources

Editor's Notes

  • #3 I would like to start the motivation to research objects with an overview of the research lifecycle. Of course we can think or find more granular representations for different domains/applications, but in general the research lifecycle starts with… Science is incrementally built on results which can be reused and therefore reproduced for validation [The “Earth Science Research and Information Lifecycle” can be defined as the continuous, iterative and on-going process used by scientists for conducting, validating and disseminating scientific knowledge. It can undergo an unlimited number of iterations which lead to the development of new and innovative ideas, concepts, techniques and technologies which ultimately benefit both science and society. The life cycle can be summarized into four main phases that include different categories of stakeholder : Scientists access information (e.g. raw data or added value products generated by colleagues) and share results; this is reliant on researchers and data providers giving access to the data and related knowledge; Shared results and information are analysed, interpretative models are generated and discussed with other colleagues (within the team and/or the wider community, which can include external stakeholders), and may require the use of visualisation tools and data analytics; Discussion leads to novel ideas and concepts which might need validation through further experimentation or data acquisition; requires access to additional data sets held by other data providers; New results are validated and shared (together with the workflow and processes used to generate them) for further discussion; including dissemination to external stakeholders (e.g. general public, policy makers,). This provides stimulus to new research bringing the process back to step 1. ]
  • #5 incl. information of the underlying context & relations between resources,
  • #6 incl. information of the underlying context & relations between resources,
  • #7 incl. information of the underlying context & relations between resources,
  • #8 incl. information of the underlying context & relations between resources,
  • #9 incl. information of the underlying context & relations between resources,
  • #10 RO-crate provides a straightforward and lightweight implementation of FDOs, which are part of the long-term vision of EOSC. FDO are defined as sequence of bits that represents an informational unit and is presented according to the FAIR principle FDO is a unit of data that is able to interact with automated data processing systems. FDOs are accessed through their PID. They may receive requests for operations, which they may inherit from their type, as known from object-oriented programming. Through operations, their metadata can be accessed, which in turn describes the enclosed data content (a bit sequence). A trusted and open virtual environment for the scientific community with seamless access to services (with highest TRLs) addressing the whole research lifecycle: EOSC aim to provide 1.7m EU researchers an environment with free, open services for data storage, management, analysis and re-use across disciplines A web of FAIR data and services Federation of eInfra and Research Infrastructures (RIs) EOSC core, the set of enabling services needed to operate the EOSC EOSC exchange registering resources and services from research infrastructures, other EOSC projects and science clusters to the EOSC and integrating them with the EOSC core functionalities the EOSC interoperability framework will provide guidelines for providers that want to integrate services or data into EOSC It recommends beginning with a first iteration to establish a Minimum Viable EOSC (MVE) addressing the needs of publicly funded researchers exploiting openly available data.
  • #11 FDOs are accessed through their PID. They may receive requests for operations, which they may inherit from their type. Through operations, their metadata can be accessed, which in turn describes the enclosed data content (a bit sequence).  The content of a DO is encoded as a structured bit-sequence and stored in repositories. It is assigned a globally unique, persistent and resolvable identifier (PID), as well as rich metadata (descriptive, scientific, system, provenance, rights, etc.). Metadata descriptions themselves are DOs. Moreover, DOs can be aggregated to collections which are also DOs with a content consisting of the references to its components.
  • #13 incl. information of the underlying context & relations between resources,
  • #14 e.g., scientific investigations, campaigns and operational processes including latest RO-crate specification
  • #22 e.g., scientific investigations, campaigns and operational processes including latest RO-crate specification
  • #23 exposing RO functionalities that can be used by other applications or by data scientists via (jupyter) notebooks Comprises backend service exposing a set of APIs IAM component integrated with EOSC AAI reference web client application Python library EOSC integration (AAI, publishing, storage) External RO added value services Semantic enrichment & recommendation Checklist evaluation Quality monitoring PDF generation Data cubes