EUDAT
A cross-disciplinary data
infrastructure in Horizon
2020
Damien Lecarpentier
EUDAT Project Manager
CSC – IT Center for Science Ltd
Exponential growth

Data ”Deluge”
Zettabytes
Exabytes
Petabytes
Terabytes
Gigabytes
Increasing complexity and variety

• Where to store it?
• How to find it?
• How to make the most of it?

2
Synergies
If there are hundreds of Research Infrastructures, how many
different data management systems can we sustain?

3

3
Riding the Wave
Collaborative Data Infrastructure
-A framework for the future? -

Trust

Data Curation

Data
Generators

Users

Community Support Services

Common Data Services
5
Consortium

6
Seven Research Communities on Board
• EPOS: European Plate Observatory System
• CLARIN: Common Language Resources and
Technology Infrastructure
• ENES: Service for Climate Modelling in Europe
• LifeWatch: Biodiversity Data and Observatories

• VPH: The Virtual Physiological Human
• INCF: International Neuroinformatics Coordinating
Facility
• DRIHM: Distributed Research Infrastructure for
Hydrometeorology
7
User Forums + 25 communities

1st User Forum
7-8 March 2012,
Barcelona

8
Service Building Process
Takes time!
Infrastructure
coordination
(resources, se
curity, etc.)

Reusing existing
technologies and
expertise rather
than reinventing
everything!
Selected Services
Metadata Catalogue

PID

Aggregated EUDAT metadata domain.
Data inventory

Identity
Integrity
Authenticity
Locations

Data Staging

Safe Replication

Simple Store

Dynamic replication
to HPC workspace
for processing

Data curation and
access optimization

Researcher data
store (simple
upload, share and
access)

New services
to come

EUDAT Box

dropbox-like service
easy sharing
local synching

Semantic Anno

checking & referencing

AAI
Network of trust
among
authentication
and
authorization
actors

Dynamic Data

immediate handling
Safe Replication Service
• Robust, safe and highly available data replication service
for small- and medium- sized repositories
– To guard against data loss in long-term archiving and
preservation
– To optimize access for
user from different regions
– To bring data closer to
powerful computers for
compute-intensive
analysis
PIDs • Policy rules
EUDAT CDI Domain of registered data

http://eudat.eu/safe-replication | eudat-safereplication@postit.csc.fi
11
Data Staging Service
• Support researchers in transferring large data collections
from EUDAT storage to HPC facilities
• Reliable, efficient, and easy-to-use tools to manage data
transfers
• Provide the means to rePRACE
ingest computational results
HPC
back into the EUDAT
infrastructure
HPC
EUDAT CDI Domain
of registered data

http://eudat.eu/datastaging | eudat-datastaging@postit.csc.fi
12
Simple Store Service
• Allow registered users to upload ”long tail” data into the
EUDAT store
• Enable sharing objects and collections with other
researchers
• Utilise other EUDAT
services to provide
reliability and data
retention

Simple upload
Simple metadata

PID registration
EUDAT CDI Domain of registered data

http://eudat.eu/simplestore | eudat-simplestore@postit.csc.fi
13
14
15
Metadata Service
• Easily find collections of scientific data – generated
either by various communities or via EUDAT services
• Access those data collections through the given
references in the metadata to the relevant data stores
• Europeana of scientific data

EUDAT CDI Domain
of registered data

http://eudat.eu/metadata | eudat-metadata@postit.csc.fi
16
17
Towards Horizon 2020
User driven services
Sustainability

Trust

Synergy

Joint e-infrastructure roadmaps
Global collaboration

18
A Network of Trusted Centers
Generic data
centres

Community
data sites

• Strong and sustainable
generic data centers
with existing trusted
relationships
• Each having specific
relationship with
research communities
• EUDAT is about
providing solutions in a
federated environment
Bridging National and European solutions
• Strong
requirement from
researchers and
funders
 Path to
Sustainability
EUDAT Priorities in H2020
• Consolidation of Core Services
– Increased performance, new functionalities, AAI, etc.
– Develop tools and policies to facilitate usage: data management
plans, licensing, training, etc.
– Development of new services

• Financial Sustainability
– Cost and funding models
– Framework and mechanisms for sharing resources across sites
and across communities (juste retour, etc.)

• Interoperability
– E-Infrastructures  a joint roadmap?
– National initiatives  service portfolios
– RDA  EUDAT as a driver and implementer
22
eudat-info@postit.csc.fi

23

Eudat presentation nov2013 | www.eudat.eu |

  • 1.
    EUDAT A cross-disciplinary data infrastructurein Horizon 2020 Damien Lecarpentier EUDAT Project Manager CSC – IT Center for Science Ltd
  • 2.
    Exponential growth Data ”Deluge” Zettabytes Exabytes Petabytes Terabytes Gigabytes Increasingcomplexity and variety • Where to store it? • How to find it? • How to make the most of it? 2
  • 3.
    Synergies If there arehundreds of Research Infrastructures, how many different data management systems can we sustain? 3 3
  • 4.
    Riding the Wave CollaborativeData Infrastructure -A framework for the future? - Trust Data Curation Data Generators Users Community Support Services Common Data Services
  • 5.
  • 6.
  • 7.
    Seven Research Communitieson Board • EPOS: European Plate Observatory System • CLARIN: Common Language Resources and Technology Infrastructure • ENES: Service for Climate Modelling in Europe • LifeWatch: Biodiversity Data and Observatories • VPH: The Virtual Physiological Human • INCF: International Neuroinformatics Coordinating Facility • DRIHM: Distributed Research Infrastructure for Hydrometeorology 7
  • 8.
    User Forums +25 communities 1st User Forum 7-8 March 2012, Barcelona 8
  • 9.
    Service Building Process Takestime! Infrastructure coordination (resources, se curity, etc.) Reusing existing technologies and expertise rather than reinventing everything!
  • 10.
    Selected Services Metadata Catalogue PID AggregatedEUDAT metadata domain. Data inventory Identity Integrity Authenticity Locations Data Staging Safe Replication Simple Store Dynamic replication to HPC workspace for processing Data curation and access optimization Researcher data store (simple upload, share and access) New services to come EUDAT Box dropbox-like service easy sharing local synching Semantic Anno checking & referencing AAI Network of trust among authentication and authorization actors Dynamic Data immediate handling
  • 11.
    Safe Replication Service •Robust, safe and highly available data replication service for small- and medium- sized repositories – To guard against data loss in long-term archiving and preservation – To optimize access for user from different regions – To bring data closer to powerful computers for compute-intensive analysis PIDs • Policy rules EUDAT CDI Domain of registered data http://eudat.eu/safe-replication | eudat-safereplication@postit.csc.fi 11
  • 12.
    Data Staging Service •Support researchers in transferring large data collections from EUDAT storage to HPC facilities • Reliable, efficient, and easy-to-use tools to manage data transfers • Provide the means to rePRACE ingest computational results HPC back into the EUDAT infrastructure HPC EUDAT CDI Domain of registered data http://eudat.eu/datastaging | eudat-datastaging@postit.csc.fi 12
  • 13.
    Simple Store Service •Allow registered users to upload ”long tail” data into the EUDAT store • Enable sharing objects and collections with other researchers • Utilise other EUDAT services to provide reliability and data retention Simple upload Simple metadata PID registration EUDAT CDI Domain of registered data http://eudat.eu/simplestore | eudat-simplestore@postit.csc.fi 13
  • 14.
  • 15.
  • 16.
    Metadata Service • Easilyfind collections of scientific data – generated either by various communities or via EUDAT services • Access those data collections through the given references in the metadata to the relevant data stores • Europeana of scientific data EUDAT CDI Domain of registered data http://eudat.eu/metadata | eudat-metadata@postit.csc.fi 16
  • 17.
  • 18.
    Towards Horizon 2020 Userdriven services Sustainability Trust Synergy Joint e-infrastructure roadmaps Global collaboration 18
  • 19.
    A Network ofTrusted Centers Generic data centres Community data sites • Strong and sustainable generic data centers with existing trusted relationships • Each having specific relationship with research communities • EUDAT is about providing solutions in a federated environment
  • 20.
    Bridging National andEuropean solutions • Strong requirement from researchers and funders  Path to Sustainability
  • 22.
    EUDAT Priorities inH2020 • Consolidation of Core Services – Increased performance, new functionalities, AAI, etc. – Develop tools and policies to facilitate usage: data management plans, licensing, training, etc. – Development of new services • Financial Sustainability – Cost and funding models – Framework and mechanisms for sharing resources across sites and across communities (juste retour, etc.) • Interoperability – E-Infrastructures  a joint roadmap? – National initiatives  service portfolios – RDA  EUDAT as a driver and implementer 22
  • 23.

Editor's Notes

  • #8 Project partners represent the data scientists in these consortia. EPOS – data and observatories for earthquakes, volcanoes, tectonics – based on sensor data. CLARIN – making language resources and technology usable ENES – simulations of the climate system using HPC Lifewatch – biodiversity research VPH – biomedical modelling and simulation of the human body