Daan Broeder presents the EUDAT community
Workshop title: Organising high-quality research data management services
Workshop abstract:
Open science needs high quality data management where researchers can create, use and share data according to well defined standards and practices. this is one of the pillars of Open Science. In the data management landscape we find quite a few organisations that aim at achieving this, however to get it right, a collaboration is called for where all can play a suitable role and present this in a consistent way to the researcher.
The proposed workshop brings together representatives of standard organisation (RDA), eInfrastructures (EUDAT) and Libraries (LIBER) that together can organise the high quality data management for research.
DAY 1 - PARALLEL SESSION 2
http://opensciencefair.eu/workshops/organising-high-quality-research-data-management-services
3. A bit of EUDAT history
EUDAT started in 2011 as an initiative to face the data deluge
and the increasing complexities and
costs of isolated solutions
provide general data management services
for a large variety of communities
there are many common requirements
… addressed by common services delivered
by a federation of compute and data centers
… research community driven
sustainable
EUDAT Initiative Funded by EC projects
EUDAT 2011 – 2014 25 partners 16ME
EUDAT2020 2015 – 2018 37 partners 20ME
Participate in EOSC-Hub from 2018
4. Community Repositories
(thematic data centres)
EUDAT generic data
service provider
storage, workflows,
processing, archive
Collaborative Data Infrastructure (CDI)
A collaboration between
Service Providers and
Research Communities
A Partnership Agreement
specifying the mutual obligations
between the EUDAT centres
a portfolio of data
management services
A data and service model that ensures
the CDI’s interoperability, extensibility
and stability
Network providing consultancy,
training, sharing of technology across
partners
6. Common Language Resources and Technology
Infrastructure (CLARIN)
EUDAT Core Communities Partners
European Network for Earth System Modelling (ENES)
Distributed infrastructure for life-science information
(ELIXIR)
European Plate Observing System (EPOS) - Solid Earth
sciences Research Infrastructure
Integrated Carbon Observation System (ICOS) to quantify
& understand greenhouse gas balance
Long-Term Ecosystem Research (LTER) in Europe
Partners in EUDAT2020,
instrumental for EUDAT service strategy development and positioning
7. Broad community engagement
Requirements & use-case gathering
- Core Communities specify broad sets of requirements covering
their whole data life cycle
- Call for Data Pilots -> 24 collaborations, large variety of
disciplines & use cases
- Participation in community projects, EUDAT WGs, community
interview TF
- All very labor intensive and time-consuming
Outreach & Communication aspects
- Communities organised on a EU level should be covered
- In the current project and beyond, but …
- Outreach to smaller groups and individuals scales badly
- Can leverage EUDATcenters participating in national projects
- Training & Documentation addressing different levels of
technical proficiency requires large investments
8. How to be more efficient?
Make use of specialized organizations & networks ….
Requirements to be delivered by RDA, W3C, ISO, IETF, …?
Accepting such input is often self-evident, but not always
What about aspects as: speed, coverage, pragmatism, flexibility,
representativity?
Data management expertise and consultancy for DM services to be
provided by academic & research libraries?
Do they have that ambition, expertise and capacity?
Overall questions:
is it in the nature of such organisations to collaborate on this
topic?
How to organise it and be inclusive to others
9. A more efficient Picture?
Std.
organisations
-------------------
Service Req.
Definition
Academic
Libraries
------------------
DM Expertise
& Consultancy
& Training
DM Service
Definition
DM Expertise
& Consultancy
& Training
EINFRA
---------------
Service
Provisioning
11. EUDAT Recipe
Building Trust between RIs & E-INFRAs
Co-design of services
Agree roles and responsibilities
Bringing e-Infrastructures together
Choosing cooperation rather than
competition
Focus on core business
Leveraging national resources
EU grants and initiatives need to support
the national systems
12. RDA Uptake by EUDAT
Data Foundation and Terminology (DFT): The DTF WG has built up a common version
of some important terminology such as “digital object” and term relations. These results
help us to ensure that we are using the same terminology as our research communities –
many of whom have also been involved in these efforts.
Data Type Registry (DTR): The DTR WG provided a schema and a prototype federated
registry system based on that schema. The schema can be used to formally describe the
content data types in sufficient detail to make them actionable automatically.
PID Information Types (PIT): The PIT WG provided a protocol and an approach to
harmonize the way in which we label the data types associated with a PID, for instance, as
a “checksum” or as an “author” field.
Practical Policy (PP): The PP WG produced a cookbook of best-practice actionable data
policies.
Metadata standards directory: This group transformed the DCC metadata catalogue to
make use of a community maintenance model. This helps ensure that the catalogue of
metadata standards that are in use stays up to date.
DSA/WDS Certification of digital repositories: This group merged two of the data
repository certification schemes, namely the Data Seal of Approval (DSA) and the World
Data Systems (WDS) certification. This increases the momentum of the certification
schemes and also ensures that the certification as taken up by many EUDAT partners is
more relevant.
13. The Complete Picture?
DM Service
Definition
DM Expertise
& Consultancy
& Training
DM Service
Provisioning
14. The Complete Picture?
DM Service
Provisioning
DM Service
Definition
DM Expertise
& Consultancy
& Training
15. EUDAT Data Domain modeled on the ANDS1 Data Curation Continiuum
1. Australian National Data Service organization – www.ands.org.au
CDI Data Domain
1515
16. Leveraging Partner Node Services
Individual EUDAT nodes contribute local services to
the CDI service portfolio including expertise &
guidance
Relevant services for the sensitive data domain are:
ePouta from CSC: secure cloud computing
environment
TSD from University of Oslo: secure platform to
collect, store and analyse sensitive data
Some partners start addressing health & medical
administrative data
Editor's Notes
WHAT IS EUDAT?
- A UNIQUE INITIATIVE OF THE MAJOR EUROPEAN DATA & COMPUTING CENTERS
- TOGETHER WITH A LARGE VARIETY OF RESEARCH COMMUNITY ORGANIZATIONS INCLUDING MANY THEMATIC DATA CENTERS
THAT BUILD and OPERATE AN INFRASTRUCTURE FOR DELIVERING GENERAL DM SERVICES AT A PAN-EU LEVEL
THOSE CENTERS ARE ALREADY POVIDING DM SERVICES TO NATIONAL COMMUNITIES.
BUT THESE ARE THEMSELVES USUALLY PART OF WIDER NETWORKS AND NEED BE SERVED BY PAN-EUROPEAN SOLUTIONS.
THAT IS WHAT MAKES THE DATA CENTERS COME TOGETHER IN EUDAT TO PROVIDE A FEDERATED LAYER OF SERVICES
The major EU research compute & data centers and general IT service providers and a broad variety of community RI organizations
EUDAT2020 is a project of some 20ME with 37 consortium partners
INITIATIVE STARTED IN 2011 “RIDING THE WAVE” REPORT from 2010 WAS OF COURSE VERY INFLUENTIAL ALSO FOR EUDAT
TWO EC PROJECTS SUPPORTING EUDAT THE SECOND ONE EUDAT2020 HAS NOW 37 PARTNERS AND A BUDGET OF 20ME
Pewi’s email example
Sustainable is bridge to projects and to the CDI PA
Federation in its ‘political’ meaning. Centers that have entered into an agreement doing things together
TO CREATE AND OPERATE THE EUDAT INFRASTRUCTURE EUDAT LEVERAGES SOMETHING CALLED THE CDI
THE CDI IS THE VEHICLE WE USE TO PROVIDE THE EUDAT SERVICES
WITH THE CDI WE CREATE THE ENVIRONMENT THAT ALLOWS US TO DELIVER OUR DM SERVICES
Service providers are the main compute & data centers in EUROPE
Governed by the CDI council
A collaboration between Service Providers and Research Communities
A Partnership Agreement specifying the mutual obligations between the EUDAT centres
a portfolio of data management services
A data and service model that ensures the CDI’s interoperability, extensibility and stability
Looking to the future we have the CDI partnership agreement and a secretariat as a basis for future coordination.
Consultancy and training
broad-based critical mass of data knowledge
long reach, excellent customer channel
Network of best practice, co-design, technology exchange
sharing of ideas, technologies, solutions across partners
B2DROP a DropBox type of collaboration solution
B2SHARE as a archiving/publication platform
B2FIND as an interdisciplinary metadata catalogue
Each of these not killer services, but the integration between them does give considerable added value.
Make the integration statement
OUR COMMUNITIES:
CORE COMMUNITIES PARTNERS IN THE PROJECT, THEY ARE RESEARCH INFRASTRUCTURES, AND WE WORK WITH THEM TO INTEGRATE EUDAT SERVICES IN THEIR WORKFLOW OR CREATE NEW SERVICES AS THE DATA SUBSCRIPTION & DISTRIBUTION SERVICE WE WORK ON WITH ELIXIR
partners in the project and whose services uptake is part of the project, long term commitment
Need them to deliver requirements, and test your services, use your services, they should the purpose in the life of the service providers
BEYOND THE CORE CUMMUNITIES WE ENGAGE WITH A LARGE GROUP OF RESEARCH COMMUNITIES AND PROJECTS THROUGH DATA PILOTS WHERE WE WORK ON SPECIFIC SOLUTIONS FOR THEIR DM REQUIREMENTS
WE ALSO HAVE THE INSTRUMENT OF EUDAT WORKING GROUPS WHERE EUDAT EXPERTS DISCUSS AND WORK WITH COMMUNITY EXPERTS ALSO OUTSIDE THE EUDAT PROJECT ABOUT SPECIFIC DM SUBJECTS:
SEMANTICS< ARRAY DB AND RECENTLY WE STARTED A WG ON SENSITIVE DATA
Refer to the booklet
Mention the topics of the WGs: Semantics, Array Database Technologies, Sensitive Data: newly started
investigate & meet requirements from Data Pilots and partners
Scope: BioMed, Social Sciences, Life Sciences
SUMMARIZING WHAT ARE UNIQUE FEATURES OF EUDAT:
CLOSE COLLABORATION WITH THE COMMUNITIES: CO-DESIGN OF SERVICES AND SEPARATION OF ROLES AND RESPONSIBILITIES
WE CAN COLLABORATE WITH OUR COLLEAGUE INFRASTRUCTURES BECAUSE WE KNOW OUR CORE BUSINESS
WE LEVERAGE THE RESOURCES AND INVESTMENTS IN THE NATIONAL SYSTEMS, WE JUST WANT TO MOVE THESE TO A PAN-EU LEVEL
i.e. projects need to come from real needs at national levels to be sustained. NO PARALEL STRUCTURE
But here there are some recipes from EUDAT that can be applied:
Building trust between Ris & EINFRAS
Bringing e-Infrastructure togethers (EUDAT trying to do that throught various ways, including in the context of the EOSC)
Building services that are embedded nationally (not just to serve European homeless scientists!)
MADE AVAILABLE TO CDI USERS BUT OF COURSE AT A COST PRICE DEPENDING ON THE RELATION WITH THE CUSTOMER
some partners that are starting to address health & medical administrative data, e.g. in collaboration with hospitals (this is the case at CSC).
TSD provides a platform for researchers working at UiO and in other public research institutions (UH-sector, university hospitals etc.) to collect, store and analyze sensitive research data.
ePouta is a Finnish cloud computing environment (Infrastructure as a Service, IaaS) designed for processing sensitive data. It is a closed environment that meets elevated information security level regulations. It is suitable for all fields of science, and also for government and research-sector organisations. The cloud service combines virtual computational resources with the customers' own resources using a dedicated light path. The service is easily scalable to customers' requirements.
https://video.nordu.net/media/Web+security+and+Secure+Research+-+NDN16+-+Track3+D3+0900/0_qd6pe7on/44520 59minutes in the vid
Such services