Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Open Data analysis with EOSC-hub services

26 views

Published on

The first workshop of the series "Services to support FAIR data" took place in Prague during the EOSC-hub week (on April 12, 2019).
Speaker: Baptiste Grenier

Published in: Science
  • Be the first to comment

  • Be the first to like this

Open Data analysis with EOSC-hub services

  1. 1. EOSC-hub receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 777536. eosc-hub.eu @EOSC_eu Baptiste Grenier / Enol Fernández EGI Foundation Open Data analysis with EOSC-hub services Dissemination level: Public
  2. 2. 2 Thanks to the EOSC-hub distributed team! Onedata and DataHub: Lukasz Dutka, Lukasz Opiola, Bartosz Kryza, Michal Orzechowski EGI FedCloud provider: Boris Parak, Miroslav Ruda, Zdenek Sustr EGI Check-in: Nicolas Liampotis B2HANDLE: Kyriakos Ginis B2FIND: Tobias Weigel, Claudia Martens
  3. 3. 3 • Several of the use cases in EOSC-hub will enable scientific end-users to perform data analysis experiments on large volumes of data, by exploiting a PID-enabled, server-side, and parallel approach. • Users expect easy to use interfaces like Jupyter Notebooks for interacting with the system. • Producing reusable results following FAIR guidelines - Findability, Accessibility, Interoperability, and Reusability. What do we want to do?
  4. 4. 4 ● Analysis ○ Notebooks / JupyterLab ○ FedCloud resources ● Data management ○ DataHub / Onedata ■ Space ■ Onezone ■ Oneprovider ■ Oneclient ● AAI (OIDC) ○ Check-in ● PID management ○ B2HANDLE ○ Handle.net ● Cataloguing and discovery ○ B2FIND How?
  5. 5. 5 ● Integrating multiple services from the EOSC-hub catalogue to build a new solution is worth the effort ○ Self-service APIs allow you to get nice combination of services without overhead, still some steps cannot be automated ○ Support channels with providers are life savers while prototyping ● Need to validate the setup for production with a real research community ● Aim at a completely integrated solution that people can reuse ○ Provide python modules for easy interaction with services ○ Expand the EGI Notebooks service ○ Ensure that all required operations can be done using API calls Lessons Learned
  6. 6. 6 Enabling reproducibility with Notebooks GitHub Your repository EGI Notebooks services Zenodo Your laptop Create repository Upload ipynb file Add requirements.txt Execute Data repository MyBinder.org Re-execute Obtain GitHub project reference Provide GitHub project reference Discover Notebook (use DOI) Fellow researchers Journal paper DOI
  7. 7. 7 An Open Science story we aim for… GitHub Your repository EGI Notebooks and Binder service Zenodo Your laptop Create repository Upload ipynb file Add requirements.txt Execute Data repository Obtain GitHub project reference Provide GitHub project reference Discover Notebook (use DOI) Fellow researchers Journal paper DOI Distributed big data DataHub B2DROP Etc.
  8. 8. 8 - Onedata ▪ https://onedata.org - EGI DataHub ▪ https://datahub.egi.eu - http://egi-datahub.readthedocs.io/ - EGI Notebooks ▪ https://www.egi.eu/services/notebooks/ - https://notebooks.egi.eu/ - EGI Check-in ▪ https://www.egi.eu/services/check-in/ - https://wiki.egi.eu/wiki/AAI - B2FIND ▪ https://eudat.eu/services/b2find - http://eudat7-ingest.dkrz.de/ - B2HANDLE ▪ https://eudat.eu/services/b2handle - https://hdl.grnet.gr:8001/api/handles ▪ Binder ▪ https://mybinder.org Links
  9. 9. eosc-hub.eu @EOSC_eu Thank you for your attention! Questions? Contact This material by Parties of the EOSC-hub Consortium is licensed under a Creative Commons Attribution 4.0 International License. Enol Fernandez - enol.fernandez@egi.eu Baptiste Grenier - baptiste.grenier@egi.eu
  10. 10. 10 1. Authenticating to DataHub using Check-in: https://datahub.egi.eu a. Showing content of space 2. Authenticating to Notebooks using Check-in: https://cs3.fedcloud-tf.fedcloud.eu a. Showing content of mounted space b. Running Wind cast analysis notebook c. Running PID registration notebook to share and publish notebooks directory 3. B2FIND cataloguing (data collected on a regular basis): http://eudat7- ingest.dkrz.de/dataset?groups=egidatahub 4. OAI-PMH metadata in DataHub: 5. http://datahub.egi.eu/oai_pmh?verb=ListRecords&metadataPrefix=oai_dc 6. PID in Handle.net registry: http://hdl.handle.net/ 7. PID pointing to shared data publicly accessible in Onedata Demonstration flow
  11. 11. 11 DataHub/Onedata Login with Check-in (OIDC)
  12. 12. 12 Check-in: IdP Selection and authentication
  13. 13. 13 IdP: Information Release consent
  14. 14. 14 Check-in: entitlements forwarded to the service
  15. 15. 15 DataHub: displaying spaces and providers
  16. 16. 16 DataHub: user space content
  17. 17. 17 Notebooks: Login with Check-in (OIDC)
  18. 18. 18 Notebooks: Jupyter Hub env
  19. 19. 19 Notebooks: Onedata space mounted locally
  20. 20. 20 Notebooks: wind casting using public dataset
  21. 21. 21 Notebooks: publishing data with PID using APIs
  22. 22. 22 Notebooks: sharing directory, minting PID
  23. 23. 23 B2FIND: discovery of harvested OAI-PMH metadata
  24. 24. 24 B2FIND: displaying an entry
  25. 25. 25 DataHub: Displaying OAI-PMH metadata
  26. 26. 26 Handle.net: the PID in the registry
  27. 27. 27 DataHub: the published dataset, from the PID

×