EOSC-hub receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 777536.
eosc-hub.eu
@EOSC_eu
Baptiste Grenier / Enol Fernández
EGI Foundation
Open Data analysis with EOSC-hub services
Dissemination level: Public
2
Thanks to the EOSC-hub distributed team!
Onedata and DataHub: Lukasz Dutka,
Lukasz Opiola, Bartosz Kryza, Michal
Orzechowski
EGI FedCloud provider: Boris Parak,
Miroslav Ruda, Zdenek Sustr
EGI Check-in: Nicolas Liampotis
B2HANDLE: Kyriakos Ginis
B2FIND: Tobias Weigel, Claudia Martens
3
• Several of the use cases in EOSC-hub will enable scientific end-users to
perform data analysis experiments on large volumes of data, by exploiting
a PID-enabled, server-side, and parallel approach.
• Users expect easy to use interfaces like Jupyter Notebooks for interacting
with the system.
• Producing reusable results following FAIR guidelines
- Findability, Accessibility, Interoperability, and Reusability.
What do we want to do?
4
● Analysis
○ Notebooks / JupyterLab
○ FedCloud resources
● Data management
○ DataHub / Onedata
■ Space
■ Onezone
■ Oneprovider
■ Oneclient
● AAI (OIDC)
○ Check-in
● PID management
○ B2HANDLE
○ Handle.net
● Cataloguing and discovery
○ B2FIND
How?
5
● Integrating multiple services from the EOSC-hub catalogue to build a new
solution is worth the effort
○ Self-service APIs allow you to get nice combination of services without
overhead, still some steps cannot be automated
○ Support channels with providers are life savers while prototyping
● Need to validate the setup for production with a real research community
● Aim at a completely integrated solution that people can reuse
○ Provide python modules for easy interaction with services
○ Expand the EGI Notebooks service
○ Ensure that all required operations can be done using API calls
Lessons Learned
6
Enabling reproducibility with Notebooks
GitHub
Your
repository
EGI Notebooks
services
Zenodo
Your
laptop
Download ipynb file
Create repository
Upload ipynb file
Add requirements.txt
Specify GitHub repo
Generate DOI
Execute
Data repository
MyBinder.org
Re-execute
Obtain GitHub project reference
Provide GitHub project reference
Discover Notebook
(use DOI)
Fellow
researchers
Journal
paper
DOI
7
An Open Science story we aim for…
GitHub
Your
repository
EGI Notebooks
and Binder service
Zenodo
Your
laptop
Download ipynb file
Create repository
Upload ipynb file
Add requirements.txt
Specify GitHub repo
Generate DOI
Execute
Data repository Obtain GitHub project reference
Provide GitHub project reference
Discover Notebook
(use DOI)
Fellow
researchers
Journal
paper
DOI
Distributed
big data
DataHub
B2DROP
Etc.
GenerateDOI
8
- Onedata
▪ https://onedata.org
- EGI DataHub
▪ https://datahub.egi.eu - http://egi-datahub.readthedocs.io/
- EGI Notebooks
▪ https://www.egi.eu/services/notebooks/ - https://notebooks.egi.eu/
- EGI Check-in
▪ https://www.egi.eu/services/check-in/ - https://wiki.egi.eu/wiki/AAI
- B2FIND
▪ https://eudat.eu/services/b2find - http://eudat7-ingest.dkrz.de/
- B2HANDLE
▪ https://eudat.eu/services/b2handle - https://hdl.grnet.gr:8001/api/handles
▪ Binder
▪ https://mybinder.org
Links
eosc-hub.eu @EOSC_eu
Thank you for your
attention!
Questions?
Contact
This material by Parties of the EOSC-hub Consortium is licensed under a Creative Commons Attribution 4.0 International License.
Enol Fernandez - enol.fernandez@egi.eu
Baptiste Grenier - baptiste.grenier@egi.eu
10
1. Authenticating to DataHub using Check-in: https://datahub.egi.eu
a. Showing content of space
2. Authenticating to Notebooks using Check-in: https://cs3.fedcloud-tf.fedcloud.eu
a. Showing content of mounted space
b. Running Wind cast analysis notebook
c. Running PID registration notebook to share and publish notebooks directory
3. B2FIND cataloguing (data collected on a regular basis): http://eudat7-
ingest.dkrz.de/dataset?groups=egidatahub
4. OAI-PMH metadata in DataHub:
5. http://datahub.egi.eu/oai_pmh?verb=ListRecords&metadataPrefix=oai_dc
6. PID in Handle.net registry: http://hdl.handle.net/
7. PID pointing to shared data publicly accessible in Onedata
Demonstration flow
11
DataHub/Onedata Login with Check-in (OIDC)
12
Check-in: IdP Selection and authentication
13
IdP: Information Release consent
14
Check-in: entitlements forwarded to the service
15
DataHub: displaying spaces and providers
16
DataHub: user space content
17
Notebooks: Login with Check-in (OIDC)
18
Notebooks: Jupyter Hub env
19
Notebooks: Onedata space mounted locally
20
Notebooks: wind casting using public dataset
21
Notebooks: publishing data with PID using APIs
22
Notebooks: sharing directory, minting PID
23
B2FIND: discovery of harvested OAI-PMH metadata
24
B2FIND: displaying an entry
25
DataHub: Displaying OAI-PMH metadata
26
Handle.net: the PID in the registry
27
DataHub: the published dataset, from the PID

Software for data management and exploitation

  • 1.
    EOSC-hub receives fundingfrom the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 777536. eosc-hub.eu @EOSC_eu Baptiste Grenier / Enol Fernández EGI Foundation Open Data analysis with EOSC-hub services Dissemination level: Public
  • 2.
    2 Thanks to theEOSC-hub distributed team! Onedata and DataHub: Lukasz Dutka, Lukasz Opiola, Bartosz Kryza, Michal Orzechowski EGI FedCloud provider: Boris Parak, Miroslav Ruda, Zdenek Sustr EGI Check-in: Nicolas Liampotis B2HANDLE: Kyriakos Ginis B2FIND: Tobias Weigel, Claudia Martens
  • 3.
    3 • Several ofthe use cases in EOSC-hub will enable scientific end-users to perform data analysis experiments on large volumes of data, by exploiting a PID-enabled, server-side, and parallel approach. • Users expect easy to use interfaces like Jupyter Notebooks for interacting with the system. • Producing reusable results following FAIR guidelines - Findability, Accessibility, Interoperability, and Reusability. What do we want to do?
  • 4.
    4 ● Analysis ○ Notebooks/ JupyterLab ○ FedCloud resources ● Data management ○ DataHub / Onedata ■ Space ■ Onezone ■ Oneprovider ■ Oneclient ● AAI (OIDC) ○ Check-in ● PID management ○ B2HANDLE ○ Handle.net ● Cataloguing and discovery ○ B2FIND How?
  • 5.
    5 ● Integrating multipleservices from the EOSC-hub catalogue to build a new solution is worth the effort ○ Self-service APIs allow you to get nice combination of services without overhead, still some steps cannot be automated ○ Support channels with providers are life savers while prototyping ● Need to validate the setup for production with a real research community ● Aim at a completely integrated solution that people can reuse ○ Provide python modules for easy interaction with services ○ Expand the EGI Notebooks service ○ Ensure that all required operations can be done using API calls Lessons Learned
  • 6.
    6 Enabling reproducibility withNotebooks GitHub Your repository EGI Notebooks services Zenodo Your laptop Download ipynb file Create repository Upload ipynb file Add requirements.txt Specify GitHub repo Generate DOI Execute Data repository MyBinder.org Re-execute Obtain GitHub project reference Provide GitHub project reference Discover Notebook (use DOI) Fellow researchers Journal paper DOI
  • 7.
    7 An Open Sciencestory we aim for… GitHub Your repository EGI Notebooks and Binder service Zenodo Your laptop Download ipynb file Create repository Upload ipynb file Add requirements.txt Specify GitHub repo Generate DOI Execute Data repository Obtain GitHub project reference Provide GitHub project reference Discover Notebook (use DOI) Fellow researchers Journal paper DOI Distributed big data DataHub B2DROP Etc. GenerateDOI
  • 8.
    8 - Onedata ▪ https://onedata.org -EGI DataHub ▪ https://datahub.egi.eu - http://egi-datahub.readthedocs.io/ - EGI Notebooks ▪ https://www.egi.eu/services/notebooks/ - https://notebooks.egi.eu/ - EGI Check-in ▪ https://www.egi.eu/services/check-in/ - https://wiki.egi.eu/wiki/AAI - B2FIND ▪ https://eudat.eu/services/b2find - http://eudat7-ingest.dkrz.de/ - B2HANDLE ▪ https://eudat.eu/services/b2handle - https://hdl.grnet.gr:8001/api/handles ▪ Binder ▪ https://mybinder.org Links
  • 9.
    eosc-hub.eu @EOSC_eu Thank youfor your attention! Questions? Contact This material by Parties of the EOSC-hub Consortium is licensed under a Creative Commons Attribution 4.0 International License. Enol Fernandez - enol.fernandez@egi.eu Baptiste Grenier - baptiste.grenier@egi.eu
  • 10.
    10 1. Authenticating toDataHub using Check-in: https://datahub.egi.eu a. Showing content of space 2. Authenticating to Notebooks using Check-in: https://cs3.fedcloud-tf.fedcloud.eu a. Showing content of mounted space b. Running Wind cast analysis notebook c. Running PID registration notebook to share and publish notebooks directory 3. B2FIND cataloguing (data collected on a regular basis): http://eudat7- ingest.dkrz.de/dataset?groups=egidatahub 4. OAI-PMH metadata in DataHub: 5. http://datahub.egi.eu/oai_pmh?verb=ListRecords&metadataPrefix=oai_dc 6. PID in Handle.net registry: http://hdl.handle.net/ 7. PID pointing to shared data publicly accessible in Onedata Demonstration flow
  • 11.
  • 12.
    12 Check-in: IdP Selectionand authentication
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
    17 Notebooks: Login withCheck-in (OIDC)
  • 18.
  • 19.
  • 20.
    20 Notebooks: wind castingusing public dataset
  • 21.
    21 Notebooks: publishing datawith PID using APIs
  • 22.
  • 23.
    23 B2FIND: discovery ofharvested OAI-PMH metadata
  • 24.
  • 25.
  • 26.
    26 Handle.net: the PIDin the registry
  • 27.
    27 DataHub: the publisheddataset, from the PID