IAC 2024 - IA Fast Track to Search Focused AI Solutions
EUDAT Generic Execution Framework
1. www.eudat.euEUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
The EUDAT Generic Execution
Framework
Workshop on Array Databases for Research
Communities, Porto
Asela, Rajapakse (MPI-M), January 22, 2018
2. The development team
Alexandr Chernov, Eberhart Karls Universität Tübingen (EKUT)
Emanuel Dima, EKUT
Pascal Dugénie, Centre Informatique National de l’Enseignement
Supérieur (CINES)
(Formerly) Weu Qiu, EKUT
Asela Rajapakse, Man-Planck-Institut für Meteorologie (MPI-M)
Luca Trani, Koninklijk Nederlands Meterologisch Instituut (KNMI)
2
Work package 5:
Work package 8:
Xavier Pivan, Centre Européen de Recherche et de Formation
Avancée en Calcul Scientifique (CERFACS)
Christian Pagé, (CERFACS)
3. EUDAT’s objective
EUDAT (short for European Data) is to fulfill the vision of
the High Level Expert Group on Scientific Data2 and
aims
“to provide an integrated, cost-effective, and sus-
tainable pan-European solution for sharing, preser-ving,
accessing, and performing computations with primary
and secondary research data.”
– EUDAT2020 Description of Work
2 High Level Expert Group on Scientific Data:
Riding the wave: How Europe can gain from the rising tide of scientific data
(2010)
Final report to the European Commission
3
4. The EUDAT Service Suite
Meant to solve the basic data service requirements of
participating RIs.
4EUDAT2020 Service Suite Overview: https://www.eudat.eu/services
5. EUDAT and the GEF
The Generic Execution Framework started as the topic
of a research activity in EUDAT 1. This produced a
partial prototype that was the starting point in EUDAT 2.
GEF development continuing in EUDAT 2 and is
conducted in:
• Work package 5: Service Building
• Work package 8: Data Life Cycle across
Communities, necessitating research for possible
GEF extensions
6. Problems still facing communities
As processing data often involves moving data from
storage nodes to compute resources, data proces-sing
becomes more costly as data volumes increase.
6
Research communities generate more data then ever and
data volumes will only increase in the future:
This principle can be inverted by moving the
processing to the data by containerization.
7. Problems still facing communities
As processing data often involves moving data from
storage nodes to compute resources, data proces-sing
becomes more costly as data volumes increase.
7
Research communities generate more data then ever and
data volumes will only increase in the future:
This principle can be inverted by moving the
processing to the data by containerization.
Docker was chosen as a containerization
solution for the GEF
8. Docker containerization
Docker containers are based on Docker images which
can be viewed as templates for container in-stances.
Every Docker image is built on top of a base image, e.g.
Ubuntu or BusyBox.
We will look at an example later!
8
“Docker containers wrap a piece of software in a complete
filesystem that contains everything needed to run: code,
[…] system tools, system libraries.” Docker Website
10. GEF: current status
The GEF code along with its documentation is hos-ted
on Github https://github.com/EUDAT-GEF/GEF.
A GEF testing instance has been set up on a VM at
Gesellschaft für wissenschaftliche Datenverarbei-tung
Göttingen (GWDG). It runs on their VMWare cluster and
can be reached under https://eudat-gef.mpimet.mpg.de
The GEF is available as a beta version as of September
2017 and is currently between TRL5 and TRL6:
10
11. GEF use cases
11
Several use cases and example services:
1. The seismology use case for extracting metadata
from seismological waveform data deposited either
in B2SHARE or B2SAFE.
2. The CDO use case for post-processing of climate
data stored in B2SHARE, B2DROP or B2SAFE.
3. The more advanced climate use case that involves
post-processing of data stored in B2SAFE or ESGF
using the EGI infrastructure (see last slide).
12. GEF use cases
12
4. The B2SHARE use case integrates the GEF and
B2SHARE so B2SHARE can employ the GEF for
internal metadata extraction on ingested data.
5. A CLARIN use case that involves their Weblicht
workflow management system and directives (see
last slide).
13. We will look at an example of how a GEF service is built.
13
19. Future plans for the GEF
19
Aim to build a GEF service repository for trusted and
tested service images to form a closed system for all GEF
services:
This will be a dedicated Docker Hub repository.
The GEF will only accept images from this repository.
The service images will be immutable once stored in the
repository (versions are possible).
The repository is to be publicly readable and all stored
images are open to the peer-review process.
Time constraints will likely cut this short.
20. More about the GEF
20
Presentation by Xavier Pivan in the session Making data and
cloud resources interoperable using EUDAT and EGI
services on Wednesday from 11:00h to 12:30h in room
Bragança.
Presentation by Claus Zinn in the session Harvesting results
from the EUDAT Community: Demonstrators and Pilots
on Wednesday from 13:30h to 15:00n in room Lagos.
GEF demonstration by Asela Rajapakse in the market-place
session An in depth look at the future CDI services on
Wednesday from 11:00h to 12:30h in room Lagos.
More information on GEF integration by communities and
GEF development is presented during the
EUDAT Conference 2018 in:
The GEF software is to facilitate processing of scientific data in a manner that is easily reproducible and minimizes data transfer cost.
Still under development.
The goal of EUDAT is not to replace existing solutions in the European research infrastructures, but to support and enrich them.
B2STAGE: transfer data between EUDAT data nodes and HPC resources
B2SAFE: replication and preservation of research data
B2SHARE: store and share small scale research data
B2FIND: portal for finding research data in EUDAT data centers and other data repositories
B2DROP: a personal cloud solution to store and share data in the early state of research DLC
Images -> containers, class -> object, construction plan -> construct