SlideShare a Scribd company logo
1 of 31
Science Demonstrator Panel
Session 1 on Life Sciences
PanCancer Science
Demonstrator - Sergei
Yakneen, EMBL
2www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
The Science Challenge
3www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
- Collect Next Generation Sequencing Data from several
cohorts of cancer patients generated at multiple
sequencing centres and across multiple cancer types.
- Reanalyze the data using a uniform and consistent data
processing pipeline utilizing established best practices
from the International Cancer Genomics Consortium.
- Analyze the integrated data set to identify patterns of
germline and somatic mutation that act across cancer
types in a PanCancer fashion.
The Science Demonstrator
4www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
- Utilize Butler, a cloud-based large-scale scientific workflow
framework developed in the context of ICGC’s Pancancer Analysis
of Whole Genomes project to perform a coordinated data
analysis across multiple clouds.
- Code - https://github.com/llevar/butler
- Paper - https://doi.org/10.1101/185736
- Perform automated repeatable deployments and configuration of
the entire processing infrastructure at three academic cloud
computing environments.
- EMBL-EBI Embassy Cloud
- ComputeCanada West Cloud
- Cyfronet
- Deliver a large dataset (>50 TB) to each cloud computing centre.
- Use Butler to run PanCancer pipelines and monitor progress.
Successes
5www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
EMBL/EBI Embassy Compute Canada Cyfronet
vCPU 1000 1000 700
RAM 4 TB 4 TB 2.6 TB
Disk 1 PB 150 TB 200 TB
Data 448 samples from 224
prostate cancer donors
422 samples from 211 pediatric
brain tumour donors
2081 samples from 1000
Genomes Project
71 TB raw data 62 TB raw data 50 TB raw data
Status Alignment and variant
calling completed
Alignment and variant calling
completed
Alignment completed
- Developed configurations for each cloud - https://github.com/llevar/eosc_pilot
- Developed extensive documentation and examples - https://butler.readthedocs.io/en/latest/
- Developed Butler self-healing capabilities.
- Performed data staging via Cyfronet Onedata.
Issues
6www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
- Biggest issue encountered by the SD was the initial
shortage of resources for operating at “cloud scale”.
- Used 20% of data set that was utilized for PCAWG
- < 0.5% of data set for 100k Genomes Project.
- Repeatable provisioning of large clusters of VMs.
- >10% of provisioning jobs experience failures
- Data movement and staging.
- 50 TB data set takes up to two weeks to move locations
- Genomics data requires encryption and network security
measures
- Shared access to network-accessible storage creates
processing bottlenecks.
Lessons Learned
7www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
- Effectively supporting life sciences use cases like cancer
genomics will require A LOT of resources.
- Diverse data-sets have diverse data handling requirements, thus
it is better to provide a variety of tools to make solutions with
rather than a single “solution”.
- Automated detection and resolution of issues with
infrastructure (a la Butler self-healing) are imperative for
effective operation at cloud-scale.
EGA – FAIR Genomic Datasets
Tony Wildish
on behalf of Nino Spataro and
the EGA-CRG team
8www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
The Science Challenge
9www.eoscpilot.e
u
The principal objectives of our SD are:
i. Test the feasibility of data reproducibility in genomics
ii. Prove the possibility to remaster genomic datasets
iii. Render genomic datasets more FAIR
The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
The Science Demonstrator
10www.eoscpilot.e
u
How we made it:
 Implementing portable containerized genomic pipelines
 Using a language enabling scalable and reproducible scientific work-flows
(Nextflow available at: https://www.nextflow.io/)
 Storing the pipelines in a public repository together with metadata
describing each pipeline step and the used tools and versions
The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
Successes
11www.eoscpilot.e
u
 Genomic pipelines portabilility
Pipelines were successfully implemented and executed in a third-party infrastructure.
 Genomic pipelines FAIRification
Pipelines were deposited jointly with metadata describing the relevant variables relevant
for pipeline description and re-use.
Pipelines available at:
https://dockstore.org/workflows/github.com/CRG-CNAG/EOSC-Pilot
 Feasibility of reproducibility and remastering in genomics
Overall, 97.38% of the obtained variants are shared and 99.66% of the called genotypes
perfectly agreed.
The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
Issues
12www.eoscpilot.e
u
 Unavailable original version of some softwares
Solved using of the closest available version
 Size of the selected dataset to replicate
Solved limiting the replicability to a subset of the original data
 Time-consuming understanding of original pipelines
The absence of consolidated standards to store and describe the original pipelines
slowed down the pipeline implementation process
The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
Lessons Learned
13www.eoscpilot.e
u
The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
 Reproducibility is a time consuming task on both the implementation and
computational side.
 Universal methods to describe pipelines are required along with long term
repositories to keep the whole experiment reproducible.
 A FAIR-compliant semantic repository on which to represent objects and their
relationships is missing in the EOSC ecosystem.
 Open science is still not perceived as scientific obligation by scientific
stakeholders. Continuous training and education is required to form a new
generation of scientists.
CryoEM
Carlos Oscar
Sorzano (CSIC)
14www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
The Science Challenge
15www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
CryoEM aims to improve
reproducibility of their work using
image processing workflows through
the production of a Scipion
workflow file that describes their
image processing steps. This allows
full reproduction of the same results
when the data is reprocessed
outside the microscope facility. This
description can also be uploaded to
public databases, so that other users
can understand the process
followed to achieve a given
structure.
The Science Demonstrator
16www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
• Adapt Scipion (an image processing workflow engine) to be
able to thoroughly report in a Json file all the inputs,
outputs, and used parameters so that the same processing
can be reproduced.
• Adapt Scipion to be able to reproduce an already existing
workflow producing exactly the same results as in the first
run.
• Connecting Scipion to a public database (Electron
Microscopy Data Bank) in order to allow the user to
automatically submit his/her results.
• Allow other users to visualize the workflow performed by
other scientists.
Successes
17www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Issues
18www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
1. Create a public repository of acquisition metadata and
image processing workflows for new acquisitions, as a
temporary repository until the data is finally analyzed and
deposited in the standard public databases (EMDB and
EMPIAR).
2. Create an authentication policy such that biologists coming
out from an EM facility could continue the image
processing in some of the EOSC cloud machines.
Lessons Learned
19www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
• There is a big gap between technological advances and
their adoption in EU facilities and scientists. Much of it due
to funding:
• Local resources for stream processing
• Existence of temporary repositories
• Access to high-end computer clusters
• There is a gap between open science promotion and the
obligation of facilities to keep and disclose publicly funded
data.
The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
Bioimaging
Beatriz Serrano-Solano
Jean-Karim Hériché
2
0
www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
The Science Challenge
2
1
www.eoscpilot.eu
▸ Biological images contain more information than described in their
original publications.
▸ Re-analyzing the images with machine learning algorithms can extract
new knowledge from these unexploited resources.
The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
The Science Demonstrator
2
2
www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
Successes
2
3
www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
Issues
2
4
www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 739563
Lessons Learned
2
5
www.eoscpilot.eu
▸ EOSC Ecosystem
▸ Technical
▸ Lack of high-performance file system
▸ Lack of big memory machines (1 TB of RAM)
▸ Services
▸ User-unfriendly deployment and set-up (e.g. ElastiCluster)
▸ Inadequate training
It would have been more efficient to use the local HPC
Photon and Neutron
Michael Schuh, DESY
26www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
The Science Challenge
27www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Data
● Volume of hundreds of PBs
● Fast data ingest, tens of GB/s per detector
● File creation at kHz rates
Computing
● Fast resources for immediate online
analysis, monitoring running experiments
● Highly specialized offline analysis
frameworks used in physics, chemistry,
materials science, biology, nanotechnology
Policy
● Data Management Plans
● Sharing of FAIR data, methods, results
between users, sites and communities
● Control access during data embargos
● Persistence, long term archival
Images: desy.de/~twhite/crystfel, cid.cfel.de/research/femtosecond_crystallography
The Science Demonstrator
28www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Motivation:
Data sets too large to take home
○ Execute codes on cloud
resources close to the data,
avoid downloading large
amounts of data to user systems
Solution:
IaaS and PaaS
○ No stack implementation
by the user
○ Efficient resource management
○ Prepare federation of DESY
OpenStack as EOSC resource
CaaS
○ Libraries for containerized
software, tools and functions
○ Run user defined software stacks
○ Container orchestration
FaaS
○ Containers as cloud functions
Service oriented architecture with cloud computing technologies
Successes
29www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Automated data processing
● Data comes in, FaaS
automatically triggered
○ Create derived data
○ Extract metadata
Interactive data analysis
● Share and re-use complete workflows
● Jupyter Notebooks as graphical frontend,
run anywhere from EOSC to small remote
system
● Notebooks and functions published and
continuously integrated via GitLab/Docker
Issues
30www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
● Fully integrated template solutions (Magnum/Heat, TOSCA) for scaling COE
clusters (Docker Swarm, kubernetes, Mesos) still cumbersome.
○ EOSC can do a great job in facilitating this with good cluster on demand
service as open science solution
● Cloud Functions (FaaS) have proven to be a good solution for short running
functions, micro-services. Integration with present HPC and HTC systems still
undefined, request routing based on job profile needs research.
○ Submitting into present HPC clusters
○ Virtualizing HPC clusters in the EOSC on demand
● Many licenses are not aware of new container distribution channels and
deployments as cloud functions, as a service.
● Integrated AAI solution needed technical and policy-wise
● Will EOSC provide cloud application building blocks?
○ Container registries
○ Message hubs
○ GitLab
○ JupyterHub
Lessons Learned
31www.eoscpilot.eu
The European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
● Scaling highly specialized scientific applications means effort,
splitting into micro-services, containerizing, cloud deployments.
○ Strengthen co-development between cloud, infrastructure, platform
DevOps and software developers as well as data analysts.
● User interaction feels different with graphical applications, Window-
Forwarding from cloud resources often low-performing.
○ Clearly define where batch, headless, API ready and GUI applications
are in focus.
● Fully templated virtualized HPC cluster solutions still to emerge,
same for native deployments and for container clusters
○ EOSC to provide collaborative templates as know-how
as well as cluster on demand solutions.
○ EOSC to provide sufficient resources
for large-scale deployments suitable for big data.

More Related Content

Similar to Science Demonstrator Session: Life and Materials Sciences

The value of EOSC from a user perspective: Key themes and actions from Day 1
The value of EOSC from a user perspective: Key themes and actions from Day 1The value of EOSC from a user perspective: Key themes and actions from Day 1
The value of EOSC from a user perspective: Key themes and actions from Day 1EOSCpilot .eu
 
The European Open Science Cloud: From vision to implementation
The European Open Science Cloud: From vision to implementationThe European Open Science Cloud: From vision to implementation
The European Open Science Cloud: From vision to implementationEOSCpilot .eu
 
OSFair2017 Workshop | EOSCpilot governance
OSFair2017 Workshop | EOSCpilot governanceOSFair2017 Workshop | EOSCpilot governance
OSFair2017 Workshop | EOSCpilot governanceOpen Science Fair
 
EOSC Policy Session - EOSC Stakeholders Forum 2018
EOSC Policy Session - EOSC Stakeholders Forum 2018EOSC Policy Session - EOSC Stakeholders Forum 2018
EOSC Policy Session - EOSC Stakeholders Forum 2018EOSCpilot .eu
 
EOSC Architecture: a System of Systems
EOSC Architecture: a System of SystemsEOSC Architecture: a System of Systems
EOSC Architecture: a System of SystemsEOSCpilot .eu
 
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...inside-BigData.com
 
Reducing Infrastructure and Service Fragmentation
Reducing Infrastructure and Service Fragmentation Reducing Infrastructure and Service Fragmentation
Reducing Infrastructure and Service Fragmentation EOSCpilot .eu
 
Repositorio de Datos LAGO
Repositorio de Datos LAGORepositorio de Datos LAGO
Repositorio de Datos LAGORodrigo Torrens
 
Governance and Sustainability of EOSC: ambitions, challenges and opportunities
Governance and Sustainability of EOSC: ambitions, challenges and opportunitiesGovernance and Sustainability of EOSC: ambitions, challenges and opportunities
Governance and Sustainability of EOSC: ambitions, challenges and opportunitiesEOSCpilot .eu
 
EOSC Governance Development Forum 7th Webinar
 EOSC Governance Development Forum 7th Webinar EOSC Governance Development Forum 7th Webinar
EOSC Governance Development Forum 7th WebinarEOSCpilot .eu
 
OSFair2017 Workshop | Towards a Policy Framework for the European Open Scienc...
OSFair2017 Workshop | Towards a Policy Framework for the European Open Scienc...OSFair2017 Workshop | Towards a Policy Framework for the European Open Scienc...
OSFair2017 Workshop | Towards a Policy Framework for the European Open Scienc...Open Science Fair
 
OSFair2017 Workshop | The European Open Science Cloud Pilot
OSFair2017 Workshop | The European Open Science Cloud Pilot OSFair2017 Workshop | The European Open Science Cloud Pilot
OSFair2017 Workshop | The European Open Science Cloud Pilot Open Science Fair
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair" OpenAIRE
 
Frictionless Data Exchange
Frictionless Data ExchangeFrictionless Data Exchange
Frictionless Data ExchangeEOSCpilot .eu
 
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...David Wallom
 
The EGI Federated Cloud, 7 months of production
The EGI Federated Cloud, 7 months of productionThe EGI Federated Cloud, 7 months of production
The EGI Federated Cloud, 7 months of productionDavid Wallom
 
EOSC Stakeholders Forum: Enabling Interoperability-Experience from EOSCpilot
EOSC Stakeholders Forum: Enabling Interoperability-Experience from EOSCpilotEOSC Stakeholders Forum: Enabling Interoperability-Experience from EOSCpilot
EOSC Stakeholders Forum: Enabling Interoperability-Experience from EOSCpilotEOSCpilot .eu
 
Mr. Badr IKKEN - Research & Development in Morocco
Mr. Badr IKKEN - Research & Development in MoroccoMr. Badr IKKEN - Research & Development in Morocco
Mr. Badr IKKEN - Research & Development in MoroccoMouhcine Benmeziane
 
3 archiver omc deployment_scenarios
3 archiver omc deployment_scenarios3 archiver omc deployment_scenarios
3 archiver omc deployment_scenariosArchiver
 
Introduction to EOSCpilot project and topical activities in the area of EOSC
Introduction to EOSCpilot project and topical activities in the area of EOSCIntroduction to EOSCpilot project and topical activities in the area of EOSC
Introduction to EOSCpilot project and topical activities in the area of EOSCEOSCpilot .eu
 

Similar to Science Demonstrator Session: Life and Materials Sciences (20)

The value of EOSC from a user perspective: Key themes and actions from Day 1
The value of EOSC from a user perspective: Key themes and actions from Day 1The value of EOSC from a user perspective: Key themes and actions from Day 1
The value of EOSC from a user perspective: Key themes and actions from Day 1
 
The European Open Science Cloud: From vision to implementation
The European Open Science Cloud: From vision to implementationThe European Open Science Cloud: From vision to implementation
The European Open Science Cloud: From vision to implementation
 
OSFair2017 Workshop | EOSCpilot governance
OSFair2017 Workshop | EOSCpilot governanceOSFair2017 Workshop | EOSCpilot governance
OSFair2017 Workshop | EOSCpilot governance
 
EOSC Policy Session - EOSC Stakeholders Forum 2018
EOSC Policy Session - EOSC Stakeholders Forum 2018EOSC Policy Session - EOSC Stakeholders Forum 2018
EOSC Policy Session - EOSC Stakeholders Forum 2018
 
EOSC Architecture: a System of Systems
EOSC Architecture: a System of SystemsEOSC Architecture: a System of Systems
EOSC Architecture: a System of Systems
 
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
 
Reducing Infrastructure and Service Fragmentation
Reducing Infrastructure and Service Fragmentation Reducing Infrastructure and Service Fragmentation
Reducing Infrastructure and Service Fragmentation
 
Repositorio de Datos LAGO
Repositorio de Datos LAGORepositorio de Datos LAGO
Repositorio de Datos LAGO
 
Governance and Sustainability of EOSC: ambitions, challenges and opportunities
Governance and Sustainability of EOSC: ambitions, challenges and opportunitiesGovernance and Sustainability of EOSC: ambitions, challenges and opportunities
Governance and Sustainability of EOSC: ambitions, challenges and opportunities
 
EOSC Governance Development Forum 7th Webinar
 EOSC Governance Development Forum 7th Webinar EOSC Governance Development Forum 7th Webinar
EOSC Governance Development Forum 7th Webinar
 
OSFair2017 Workshop | Towards a Policy Framework for the European Open Scienc...
OSFair2017 Workshop | Towards a Policy Framework for the European Open Scienc...OSFair2017 Workshop | Towards a Policy Framework for the European Open Scienc...
OSFair2017 Workshop | Towards a Policy Framework for the European Open Scienc...
 
OSFair2017 Workshop | The European Open Science Cloud Pilot
OSFair2017 Workshop | The European Open Science Cloud Pilot OSFair2017 Workshop | The European Open Science Cloud Pilot
OSFair2017 Workshop | The European Open Science Cloud Pilot
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
Frictionless Data Exchange
Frictionless Data ExchangeFrictionless Data Exchange
Frictionless Data Exchange
 
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
 
The EGI Federated Cloud, 7 months of production
The EGI Federated Cloud, 7 months of productionThe EGI Federated Cloud, 7 months of production
The EGI Federated Cloud, 7 months of production
 
EOSC Stakeholders Forum: Enabling Interoperability-Experience from EOSCpilot
EOSC Stakeholders Forum: Enabling Interoperability-Experience from EOSCpilotEOSC Stakeholders Forum: Enabling Interoperability-Experience from EOSCpilot
EOSC Stakeholders Forum: Enabling Interoperability-Experience from EOSCpilot
 
Mr. Badr IKKEN - Research & Development in Morocco
Mr. Badr IKKEN - Research & Development in MoroccoMr. Badr IKKEN - Research & Development in Morocco
Mr. Badr IKKEN - Research & Development in Morocco
 
3 archiver omc deployment_scenarios
3 archiver omc deployment_scenarios3 archiver omc deployment_scenarios
3 archiver omc deployment_scenarios
 
Introduction to EOSCpilot project and topical activities in the area of EOSC
Introduction to EOSCpilot project and topical activities in the area of EOSCIntroduction to EOSCpilot project and topical activities in the area of EOSC
Introduction to EOSCpilot project and topical activities in the area of EOSC
 

More from EOSCpilot .eu

EOSC FAIR Data Session - EOSC Stakeholders Forum 2018
EOSC FAIR Data Session - EOSC Stakeholders Forum 2018EOSC FAIR Data Session - EOSC Stakeholders Forum 2018
EOSC FAIR Data Session - EOSC Stakeholders Forum 2018EOSCpilot .eu
 
FAIR Assessment for Repositories and Researchers
FAIR Assessment for Repositories and Researchers FAIR Assessment for Repositories and Researchers
FAIR Assessment for Repositories and Researchers EOSCpilot .eu
 
Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...
Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...
Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...EOSCpilot .eu
 
EOSC Governance Session - EOSC Stakeholders Forum 2018
EOSC Governance Session - EOSC Stakeholders Forum 2018EOSC Governance Session - EOSC Stakeholders Forum 2018
EOSC Governance Session - EOSC Stakeholders Forum 2018EOSCpilot .eu
 
EOSC and National Providers
EOSC and National ProvidersEOSC and National Providers
EOSC and National ProvidersEOSCpilot .eu
 
EMBL-Sustainable Access to the World's Largest Biomolecular Data Resources
EMBL-Sustainable Access to the World's Largest Biomolecular Data ResourcesEMBL-Sustainable Access to the World's Largest Biomolecular Data Resources
EMBL-Sustainable Access to the World's Largest Biomolecular Data ResourcesEOSCpilot .eu
 
Earth Science Needs and Opportunities to Define the EOSC Service Roadmap
Earth Science Needs and Opportunities to Define the EOSC Service RoadmapEarth Science Needs and Opportunities to Define the EOSC Service Roadmap
Earth Science Needs and Opportunities to Define the EOSC Service RoadmapEOSCpilot .eu
 
EOSC Stakeholders Forum: Interoperability Session and Panel Discussion
EOSC Stakeholders Forum: Interoperability Session and Panel DiscussionEOSC Stakeholders Forum: Interoperability Session and Panel Discussion
EOSC Stakeholders Forum: Interoperability Session and Panel DiscussionEOSCpilot .eu
 
EOSC Stakeholders Forum: For a FAIR Europe-What's Needed, What's Existing
EOSC Stakeholders Forum: For a FAIR Europe-What's Needed, What's ExistingEOSC Stakeholders Forum: For a FAIR Europe-What's Needed, What's Existing
EOSC Stakeholders Forum: For a FAIR Europe-What's Needed, What's ExistingEOSCpilot .eu
 
EOSC Architecture Session - EOSC Stakeholders Forum 2018
EOSC Architecture Session - EOSC Stakeholders Forum 2018EOSC Architecture Session - EOSC Stakeholders Forum 2018
EOSC Architecture Session - EOSC Stakeholders Forum 2018EOSCpilot .eu
 
Building a European policy framework
Building a European policy frameworkBuilding a European policy framework
Building a European policy frameworkEOSCpilot .eu
 
Interoperability in practice and FAIR data principles
Interoperability in practice and FAIR data principlesInteroperability in practice and FAIR data principles
Interoperability in practice and FAIR data principlesEOSCpilot .eu
 
OpenAIRE Advance: A trusted eInfrastructure for the EOSC
OpenAIRE Advance: A trusted eInfrastructure for the EOSCOpenAIRE Advance: A trusted eInfrastructure for the EOSC
OpenAIRE Advance: A trusted eInfrastructure for the EOSCEOSCpilot .eu
 
The EOSC-hub: Integrating and managing services for the European Open Science...
The EOSC-hub: Integrating and managing services for the European Open Science...The EOSC-hub: Integrating and managing services for the European Open Science...
The EOSC-hub: Integrating and managing services for the European Open Science...EOSCpilot .eu
 

More from EOSCpilot .eu (17)

EOSC FAIR Data Session - EOSC Stakeholders Forum 2018
EOSC FAIR Data Session - EOSC Stakeholders Forum 2018EOSC FAIR Data Session - EOSC Stakeholders Forum 2018
EOSC FAIR Data Session - EOSC Stakeholders Forum 2018
 
FAIR Assessment for Repositories and Researchers
FAIR Assessment for Repositories and Researchers FAIR Assessment for Repositories and Researchers
FAIR Assessment for Repositories and Researchers
 
Ready, Set, GO FAIR
Ready, Set, GO FAIRReady, Set, GO FAIR
Ready, Set, GO FAIR
 
Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...
Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...
Results from the FAIR Expert Group Stakeholder Consultation on the FAIR Data ...
 
EOSC Governance Session - EOSC Stakeholders Forum 2018
EOSC Governance Session - EOSC Stakeholders Forum 2018EOSC Governance Session - EOSC Stakeholders Forum 2018
EOSC Governance Session - EOSC Stakeholders Forum 2018
 
New Data Services
New Data ServicesNew Data Services
New Data Services
 
EOSC and National Providers
EOSC and National ProvidersEOSC and National Providers
EOSC and National Providers
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
EMBL-Sustainable Access to the World's Largest Biomolecular Data Resources
EMBL-Sustainable Access to the World's Largest Biomolecular Data ResourcesEMBL-Sustainable Access to the World's Largest Biomolecular Data Resources
EMBL-Sustainable Access to the World's Largest Biomolecular Data Resources
 
Earth Science Needs and Opportunities to Define the EOSC Service Roadmap
Earth Science Needs and Opportunities to Define the EOSC Service RoadmapEarth Science Needs and Opportunities to Define the EOSC Service Roadmap
Earth Science Needs and Opportunities to Define the EOSC Service Roadmap
 
EOSC Stakeholders Forum: Interoperability Session and Panel Discussion
EOSC Stakeholders Forum: Interoperability Session and Panel DiscussionEOSC Stakeholders Forum: Interoperability Session and Panel Discussion
EOSC Stakeholders Forum: Interoperability Session and Panel Discussion
 
EOSC Stakeholders Forum: For a FAIR Europe-What's Needed, What's Existing
EOSC Stakeholders Forum: For a FAIR Europe-What's Needed, What's ExistingEOSC Stakeholders Forum: For a FAIR Europe-What's Needed, What's Existing
EOSC Stakeholders Forum: For a FAIR Europe-What's Needed, What's Existing
 
EOSC Architecture Session - EOSC Stakeholders Forum 2018
EOSC Architecture Session - EOSC Stakeholders Forum 2018EOSC Architecture Session - EOSC Stakeholders Forum 2018
EOSC Architecture Session - EOSC Stakeholders Forum 2018
 
Building a European policy framework
Building a European policy frameworkBuilding a European policy framework
Building a European policy framework
 
Interoperability in practice and FAIR data principles
Interoperability in practice and FAIR data principlesInteroperability in practice and FAIR data principles
Interoperability in practice and FAIR data principles
 
OpenAIRE Advance: A trusted eInfrastructure for the EOSC
OpenAIRE Advance: A trusted eInfrastructure for the EOSCOpenAIRE Advance: A trusted eInfrastructure for the EOSC
OpenAIRE Advance: A trusted eInfrastructure for the EOSC
 
The EOSC-hub: Integrating and managing services for the European Open Science...
The EOSC-hub: Integrating and managing services for the European Open Science...The EOSC-hub: Integrating and managing services for the European Open Science...
The EOSC-hub: Integrating and managing services for the European Open Science...
 

Recently uploaded

GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx201bo007
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxpriyankatabhane
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionJadeNovelo1
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsDobusch Leonhard
 
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary MicrobiologyLAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary MicrobiologyChayanika Das
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Christina Parmionova
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1
 
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasBACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasChayanika Das
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...Chayanika Das
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsMarkus Roggen
 

Recently uploaded (20)

GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptx
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and Function
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and Pitfalls
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
Introduction Classification Of Alkaloids
Introduction Classification Of AlkaloidsIntroduction Classification Of Alkaloids
Introduction Classification Of Alkaloids
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary MicrobiologyLAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
 
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasBACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
 
PLASMODIUM. PPTX
PLASMODIUM. PPTXPLASMODIUM. PPTX
PLASMODIUM. PPTX
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
 

Science Demonstrator Session: Life and Materials Sciences

  • 2. PanCancer Science Demonstrator - Sergei Yakneen, EMBL 2www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563
  • 3. The Science Challenge 3www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 - Collect Next Generation Sequencing Data from several cohorts of cancer patients generated at multiple sequencing centres and across multiple cancer types. - Reanalyze the data using a uniform and consistent data processing pipeline utilizing established best practices from the International Cancer Genomics Consortium. - Analyze the integrated data set to identify patterns of germline and somatic mutation that act across cancer types in a PanCancer fashion.
  • 4. The Science Demonstrator 4www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 - Utilize Butler, a cloud-based large-scale scientific workflow framework developed in the context of ICGC’s Pancancer Analysis of Whole Genomes project to perform a coordinated data analysis across multiple clouds. - Code - https://github.com/llevar/butler - Paper - https://doi.org/10.1101/185736 - Perform automated repeatable deployments and configuration of the entire processing infrastructure at three academic cloud computing environments. - EMBL-EBI Embassy Cloud - ComputeCanada West Cloud - Cyfronet - Deliver a large dataset (>50 TB) to each cloud computing centre. - Use Butler to run PanCancer pipelines and monitor progress.
  • 5. Successes 5www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 EMBL/EBI Embassy Compute Canada Cyfronet vCPU 1000 1000 700 RAM 4 TB 4 TB 2.6 TB Disk 1 PB 150 TB 200 TB Data 448 samples from 224 prostate cancer donors 422 samples from 211 pediatric brain tumour donors 2081 samples from 1000 Genomes Project 71 TB raw data 62 TB raw data 50 TB raw data Status Alignment and variant calling completed Alignment and variant calling completed Alignment completed - Developed configurations for each cloud - https://github.com/llevar/eosc_pilot - Developed extensive documentation and examples - https://butler.readthedocs.io/en/latest/ - Developed Butler self-healing capabilities. - Performed data staging via Cyfronet Onedata.
  • 6. Issues 6www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 - Biggest issue encountered by the SD was the initial shortage of resources for operating at “cloud scale”. - Used 20% of data set that was utilized for PCAWG - < 0.5% of data set for 100k Genomes Project. - Repeatable provisioning of large clusters of VMs. - >10% of provisioning jobs experience failures - Data movement and staging. - 50 TB data set takes up to two weeks to move locations - Genomics data requires encryption and network security measures - Shared access to network-accessible storage creates processing bottlenecks.
  • 7. Lessons Learned 7www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 - Effectively supporting life sciences use cases like cancer genomics will require A LOT of resources. - Diverse data-sets have diverse data handling requirements, thus it is better to provide a variety of tools to make solutions with rather than a single “solution”. - Automated detection and resolution of issues with infrastructure (a la Butler self-healing) are imperative for effective operation at cloud-scale.
  • 8. EGA – FAIR Genomic Datasets Tony Wildish on behalf of Nino Spataro and the EGA-CRG team 8www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563
  • 9. The Science Challenge 9www.eoscpilot.e u The principal objectives of our SD are: i. Test the feasibility of data reproducibility in genomics ii. Prove the possibility to remaster genomic datasets iii. Render genomic datasets more FAIR The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563
  • 10. The Science Demonstrator 10www.eoscpilot.e u How we made it:  Implementing portable containerized genomic pipelines  Using a language enabling scalable and reproducible scientific work-flows (Nextflow available at: https://www.nextflow.io/)  Storing the pipelines in a public repository together with metadata describing each pipeline step and the used tools and versions The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563
  • 11. Successes 11www.eoscpilot.e u  Genomic pipelines portabilility Pipelines were successfully implemented and executed in a third-party infrastructure.  Genomic pipelines FAIRification Pipelines were deposited jointly with metadata describing the relevant variables relevant for pipeline description and re-use. Pipelines available at: https://dockstore.org/workflows/github.com/CRG-CNAG/EOSC-Pilot  Feasibility of reproducibility and remastering in genomics Overall, 97.38% of the obtained variants are shared and 99.66% of the called genotypes perfectly agreed. The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563
  • 12. Issues 12www.eoscpilot.e u  Unavailable original version of some softwares Solved using of the closest available version  Size of the selected dataset to replicate Solved limiting the replicability to a subset of the original data  Time-consuming understanding of original pipelines The absence of consolidated standards to store and describe the original pipelines slowed down the pipeline implementation process The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563
  • 13. Lessons Learned 13www.eoscpilot.e u The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563  Reproducibility is a time consuming task on both the implementation and computational side.  Universal methods to describe pipelines are required along with long term repositories to keep the whole experiment reproducible.  A FAIR-compliant semantic repository on which to represent objects and their relationships is missing in the EOSC ecosystem.  Open science is still not perceived as scientific obligation by scientific stakeholders. Continuous training and education is required to form a new generation of scientists.
  • 14. CryoEM Carlos Oscar Sorzano (CSIC) 14www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563
  • 15. The Science Challenge 15www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 CryoEM aims to improve reproducibility of their work using image processing workflows through the production of a Scipion workflow file that describes their image processing steps. This allows full reproduction of the same results when the data is reprocessed outside the microscope facility. This description can also be uploaded to public databases, so that other users can understand the process followed to achieve a given structure.
  • 16. The Science Demonstrator 16www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 • Adapt Scipion (an image processing workflow engine) to be able to thoroughly report in a Json file all the inputs, outputs, and used parameters so that the same processing can be reproduced. • Adapt Scipion to be able to reproduce an already existing workflow producing exactly the same results as in the first run. • Connecting Scipion to a public database (Electron Microscopy Data Bank) in order to allow the user to automatically submit his/her results. • Allow other users to visualize the workflow performed by other scientists.
  • 17. Successes 17www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563
  • 18. Issues 18www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 1. Create a public repository of acquisition metadata and image processing workflows for new acquisitions, as a temporary repository until the data is finally analyzed and deposited in the standard public databases (EMDB and EMPIAR). 2. Create an authentication policy such that biologists coming out from an EM facility could continue the image processing in some of the EOSC cloud machines.
  • 19. Lessons Learned 19www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 • There is a big gap between technological advances and their adoption in EU facilities and scientists. Much of it due to funding: • Local resources for stream processing • Existence of temporary repositories • Access to high-end computer clusters • There is a gap between open science promotion and the obligation of facilities to keep and disclose publicly funded data.
  • 20. The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 Bioimaging Beatriz Serrano-Solano Jean-Karim Hériché 2 0 www.eoscpilot.eu
  • 21. The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 The Science Challenge 2 1 www.eoscpilot.eu ▸ Biological images contain more information than described in their original publications. ▸ Re-analyzing the images with machine learning algorithms can extract new knowledge from these unexploited resources.
  • 22. The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 The Science Demonstrator 2 2 www.eoscpilot.eu
  • 23. The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 Successes 2 3 www.eoscpilot.eu
  • 24. The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 Issues 2 4 www.eoscpilot.eu
  • 25. The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 Lessons Learned 2 5 www.eoscpilot.eu ▸ EOSC Ecosystem ▸ Technical ▸ Lack of high-performance file system ▸ Lack of big memory machines (1 TB of RAM) ▸ Services ▸ User-unfriendly deployment and set-up (e.g. ElastiCluster) ▸ Inadequate training It would have been more efficient to use the local HPC
  • 26. Photon and Neutron Michael Schuh, DESY 26www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563
  • 27. The Science Challenge 27www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 Data ● Volume of hundreds of PBs ● Fast data ingest, tens of GB/s per detector ● File creation at kHz rates Computing ● Fast resources for immediate online analysis, monitoring running experiments ● Highly specialized offline analysis frameworks used in physics, chemistry, materials science, biology, nanotechnology Policy ● Data Management Plans ● Sharing of FAIR data, methods, results between users, sites and communities ● Control access during data embargos ● Persistence, long term archival Images: desy.de/~twhite/crystfel, cid.cfel.de/research/femtosecond_crystallography
  • 28. The Science Demonstrator 28www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 Motivation: Data sets too large to take home ○ Execute codes on cloud resources close to the data, avoid downloading large amounts of data to user systems Solution: IaaS and PaaS ○ No stack implementation by the user ○ Efficient resource management ○ Prepare federation of DESY OpenStack as EOSC resource CaaS ○ Libraries for containerized software, tools and functions ○ Run user defined software stacks ○ Container orchestration FaaS ○ Containers as cloud functions Service oriented architecture with cloud computing technologies
  • 29. Successes 29www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 Automated data processing ● Data comes in, FaaS automatically triggered ○ Create derived data ○ Extract metadata Interactive data analysis ● Share and re-use complete workflows ● Jupyter Notebooks as graphical frontend, run anywhere from EOSC to small remote system ● Notebooks and functions published and continuously integrated via GitLab/Docker
  • 30. Issues 30www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 ● Fully integrated template solutions (Magnum/Heat, TOSCA) for scaling COE clusters (Docker Swarm, kubernetes, Mesos) still cumbersome. ○ EOSC can do a great job in facilitating this with good cluster on demand service as open science solution ● Cloud Functions (FaaS) have proven to be a good solution for short running functions, micro-services. Integration with present HPC and HTC systems still undefined, request routing based on job profile needs research. ○ Submitting into present HPC clusters ○ Virtualizing HPC clusters in the EOSC on demand ● Many licenses are not aware of new container distribution channels and deployments as cloud functions, as a service. ● Integrated AAI solution needed technical and policy-wise ● Will EOSC provide cloud application building blocks? ○ Container registries ○ Message hubs ○ GitLab ○ JupyterHub
  • 31. Lessons Learned 31www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 ● Scaling highly specialized scientific applications means effort, splitting into micro-services, containerizing, cloud deployments. ○ Strengthen co-development between cloud, infrastructure, platform DevOps and software developers as well as data analysts. ● User interaction feels different with graphical applications, Window- Forwarding from cloud resources often low-performing. ○ Clearly define where batch, headless, API ready and GUI applications are in focus. ● Fully templated virtualized HPC cluster solutions still to emerge, same for native deployments and for container clusters ○ EOSC to provide collaborative templates as know-how as well as cluster on demand solutions. ○ EOSC to provide sufficient resources for large-scale deployments suitable for big data.