Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Data Processing and Analysis
1. EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 www.eudat.eu
Data Processing and
Analysis
EUDAT WP5 Service Building
Tom Kirkham
STFC
2. DATA PROCESSING AND ANALYSIS
- GEF
- Big Data Tools
- B2NOTE
- Data Distribution
3. Transfer large data collections from EUDAT
storage facilities to external HPC facilities for
processing
In conjunction with B2SAFE, replicate
community data sets, ingesting them onto
EUDAT storage resources for long-term
preservation
Ingest computation results into the EUDAT
infrastructure
B2STAGE provides API services to manage data
transfers between:
B2SAFE , B2HANDLE and B2ACCESS
The service allows users to: eudat.eu/b2stage
3EUDAT 6M EC Review, 28th October 2015, Brussels
RVIEW
4. • Access layer to the B2SAFE & B2FIND
services, to allow users to store, preserve and
find data
• Enables upload and Download Data transfers of
data objects to create collections
EUDAT 6M EC Review, 28th October 2015, Brussels
KGROUND
6. GRESS
Achievements
- Integration between B2Handle, B2Access and
B2Safe
- Enablement of data movement into CDI
- HTTP API as a method for common access
●- Developed and released
Integration with Data Discovery Service and
standards support such as PID
●- Integration from community repositories with
B2SAFE via the HTTP API, the work done by
Charles University
●- Proof-of-concept of the HTTP API on plain
filesystems, for workspaces.
Future Status
- Development continues
- Application into specific tools and filestores
7. THE GENERIC EXECUTION
FRAMEWORK
(GEF)
Goal: Enable execution of containerised
software within CDI
Thus reducing data transfer and increasing
customisation for user communities.
Technology objectives
- Utilise EUDAT services B2Share, B2Drop
such as B2Safe (planned)
- Support a GEF rules engine (i.e. Drools)
- Integrate services into CDI from user
communities
8. GEF services/Docker containers
GEF services are Docker images that are specifically
annotated in order to allow handling by the GEF.
GEF service instances are Docker containers that are
spun up for execution close to the data.
User communities are solely responsible for the contents
of their images. During the pilot phase, communities will
receive support for creating their own images. But in the
long run, scientists will have to become proficient at it.
The GEF relies on so-called GEF services that are
customized by the user to perform the required tasks:
9. A GEF INSTANCE
The container/GEF service invocations on the hosts are
controlled by a Docker Machine integrated with a GEF
instance.
10. THE GENERIC EXECUTION
FRAMEWORK
(GEF)
Achievements
- Generic Execution Service (GEF) first
release in September.
- Integrated services from Earth Science Grid
- Federation (ESGF) and European Grid
Infrastructure (EGI) e-infrastructures
Future Work
- Integration into other communities such as
IS-ENES Climate4Impact platform
11. • Creation RDF triples
• Harvests information from ontology repositories
• Supports semi-automatic annotation using text mining
• Supports manual data annotation
• Easy to use user interface
• Write data on the triple store
• Integrates with the different EUDAT B2 services
11EUDAT 6M EC Review, 28th October 2015, Brussels
FEATURES
12. Achievements
B2Note module create to support creation of
annotations
Standards based and integrated with B2Share
B2Access integration enables users federated
access to resources
Software released in January and over 100
active users
Future Work
Integration into communities such as OpenAire
Future development in EOSC project
Easy integration into community services and
within OpenAIRE and EOSC-hub services
13. BIG DATA ANALYSIS
Goal: To open up data deposited in EUDAT CDI to
‘Big Data’ processing
Objectives:
Integrate ‘Big Data’ stack into CDI
To handle data from EUDAT components
Enable ‘Big Data analysis in user communities
15. BIG DATA ANALYSIS
Achievements
Apache Spark and Hadoop enabled in EUDAT
Data subscription service created to link analysis
results with user communities
Integrated within EUROARGO use case
Future Work
Further development and integration of data
subscription service into other projects such as
EOSC
16. DATA DISTRIBUTION SERVICE
Data Distribution in terms of discovery, transfer and
integration has been a core focus in this cluster
Federated integration of data
Data annotation layer aiding discovery
Integration with services via common API
Event based subscription of data
Beyond EUDAT this technology is reaching out into other
projects
Raising the possibility of a wider view on Data
Distribution as a Service.
18. SUMMARY
Software released:
B2STAGE HTTP API
B2NOTE
Generic Execution Framework
Data Subscription Service
Community use to go beyond project
Projects actively working on software beyond
project i.e. EOSC-hub, SeaDataCloud etc