www.eudat.eu
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
Service provisioning for
Excellent Science
Daan Broeder – EUDAT/CLARIN
EUDAT
A bit of EUDAT history
EUDAT started in 2011 as an initiative to face the data deluge
and the increasing complexities and
costs of isolated solutions
provide general data management services
for a large variety of communities
there are many common requirements
… addressed by common services delivered
by a federation of compute and data centers
… research community driven
sustainable
EUDAT Initiative Funded by EC projects
EUDAT 2011 – 2014 25 partners 16ME
EUDAT2020 2015 – 2018 37 partners 20ME
Participate in EOSC-Hub from 2018
Community Repositories
(thematic data centres)
EUDAT generic data
service provider
storage, workflows,
processing, archive
Collaborative Data Infrastructure (CDI)
A collaboration between
Service Providers and
Research Communities
A Partnership Agreement
specifying the mutual obligations
between the EUDAT centres
a portfolio of data
management services
A data and service model that ensures
the CDI’s interoperability, extensibility
and stability
Network providing consultancy,
training, sharing of technology across
partners
What is the EUDAT Service offer?
Common Language Resources and Technology
Infrastructure (CLARIN)
EUDAT Core Communities Partners
European Network for Earth System Modelling (ENES)
Distributed infrastructure for life-science information
(ELIXIR)
European Plate Observing System (EPOS) - Solid Earth
sciences Research Infrastructure
Integrated Carbon Observation System (ICOS) to quantify
& understand greenhouse gas balance
Long-Term Ecosystem Research (LTER) in Europe
Partners in EUDAT2020,
instrumental for EUDAT service strategy development and positioning
Broad community engagement
Requirements & use-case gathering
- Core Communities specify broad sets of requirements covering
their whole data life cycle
- Call for Data Pilots -> 24 collaborations, large variety of
disciplines & use cases
- Participation in community projects, EUDAT WGs, community
interview TF
- All very labor intensive and time-consuming
Outreach & Communication aspects
- Communities organised on a EU level should be covered
- In the current project and beyond, but …
- Outreach to smaller groups and individuals scales badly
- Can leverage EUDATcenters participating in national projects
- Training & Documentation addressing different levels of
technical proficiency requires large investments
How to be more efficient?
Make use of specialized organizations & networks ….
Requirements to be delivered by RDA, W3C, ISO, IETF, …?
Accepting such input is often self-evident, but not always
What about aspects as: speed, coverage, pragmatism, flexibility,
representativity?
Data management expertise and consultancy for DM services to be
provided by academic & research libraries?
Do they have that ambition, expertise and capacity?
Overall questions:
is it in the nature of such organisations to collaborate on this
topic?
How to organise it and be inclusive to others
A more efficient Picture?
Std.
organisations
-------------------
Service Req.
Definition
Academic
Libraries
------------------
DM Expertise
& Consultancy
& Training
DM Service
Definition
DM Expertise
& Consultancy
& Training
EINFRA
---------------
Service
Provisioning
QUESTIONS?
EUDAT Recipe
Building Trust between RIs & E-INFRAs
Co-design of services
Agree roles and responsibilities
Bringing e-Infrastructures together
Choosing cooperation rather than
competition
Focus on core business
Leveraging national resources
EU grants and initiatives need to support
the national systems
RDA Uptake by EUDAT
Data Foundation and Terminology (DFT): The DTF WG has built up a common version
of some important terminology such as “digital object” and term relations. These results
help us to ensure that we are using the same terminology as our research communities –
many of whom have also been involved in these efforts.
Data Type Registry (DTR): The DTR WG provided a schema and a prototype federated
registry system based on that schema. The schema can be used to formally describe the
content data types in sufficient detail to make them actionable automatically.
PID Information Types (PIT): The PIT WG provided a protocol and an approach to
harmonize the way in which we label the data types associated with a PID, for instance, as
a “checksum” or as an “author” field.
Practical Policy (PP): The PP WG produced a cookbook of best-practice actionable data
policies.
Metadata standards directory: This group transformed the DCC metadata catalogue to
make use of a community maintenance model. This helps ensure that the catalogue of
metadata standards that are in use stays up to date.
DSA/WDS Certification of digital repositories: This group merged two of the data
repository certification schemes, namely the Data Seal of Approval (DSA) and the World
Data Systems (WDS) certification. This increases the momentum of the certification
schemes and also ensures that the certification as taken up by many EUDAT partners is
more relevant.
The Complete Picture?
DM Service
Definition
DM Expertise
& Consultancy
& Training
DM Service
Provisioning
The Complete Picture?
DM Service
Provisioning
DM Service
Definition
DM Expertise
& Consultancy
& Training
EUDAT Data Domain modeled on the ANDS1 Data Curation Continiuum
1. Australian National Data Service organization – www.ands.org.au
CDI Data Domain
1515
Leveraging Partner Node Services
Individual EUDAT nodes contribute local services to
the CDI service portfolio including expertise &
guidance
Relevant services for the sensitive data domain are:
ePouta from CSC: secure cloud computing
environment
TSD from University of Oslo: secure platform to
collect, store and analyse sensitive data
Some partners start addressing health & medical
administrative data

OSFair2017 Workshop | Service provisioning for excellent sciences

  • 1.
    www.eudat.eu EUDAT receives fundingfrom the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 Service provisioning for Excellent Science Daan Broeder – EUDAT/CLARIN
  • 2.
  • 3.
    A bit ofEUDAT history EUDAT started in 2011 as an initiative to face the data deluge and the increasing complexities and costs of isolated solutions provide general data management services for a large variety of communities there are many common requirements … addressed by common services delivered by a federation of compute and data centers … research community driven sustainable EUDAT Initiative Funded by EC projects EUDAT 2011 – 2014 25 partners 16ME EUDAT2020 2015 – 2018 37 partners 20ME Participate in EOSC-Hub from 2018
  • 4.
    Community Repositories (thematic datacentres) EUDAT generic data service provider storage, workflows, processing, archive Collaborative Data Infrastructure (CDI) A collaboration between Service Providers and Research Communities A Partnership Agreement specifying the mutual obligations between the EUDAT centres a portfolio of data management services A data and service model that ensures the CDI’s interoperability, extensibility and stability Network providing consultancy, training, sharing of technology across partners
  • 5.
    What is theEUDAT Service offer?
  • 6.
    Common Language Resourcesand Technology Infrastructure (CLARIN) EUDAT Core Communities Partners European Network for Earth System Modelling (ENES) Distributed infrastructure for life-science information (ELIXIR) European Plate Observing System (EPOS) - Solid Earth sciences Research Infrastructure Integrated Carbon Observation System (ICOS) to quantify & understand greenhouse gas balance Long-Term Ecosystem Research (LTER) in Europe Partners in EUDAT2020, instrumental for EUDAT service strategy development and positioning
  • 7.
    Broad community engagement Requirements& use-case gathering - Core Communities specify broad sets of requirements covering their whole data life cycle - Call for Data Pilots -> 24 collaborations, large variety of disciplines & use cases - Participation in community projects, EUDAT WGs, community interview TF - All very labor intensive and time-consuming Outreach & Communication aspects - Communities organised on a EU level should be covered - In the current project and beyond, but … - Outreach to smaller groups and individuals scales badly - Can leverage EUDATcenters participating in national projects - Training & Documentation addressing different levels of technical proficiency requires large investments
  • 8.
    How to bemore efficient? Make use of specialized organizations & networks …. Requirements to be delivered by RDA, W3C, ISO, IETF, …? Accepting such input is often self-evident, but not always What about aspects as: speed, coverage, pragmatism, flexibility, representativity? Data management expertise and consultancy for DM services to be provided by academic & research libraries? Do they have that ambition, expertise and capacity? Overall questions: is it in the nature of such organisations to collaborate on this topic? How to organise it and be inclusive to others
  • 9.
    A more efficientPicture? Std. organisations ------------------- Service Req. Definition Academic Libraries ------------------ DM Expertise & Consultancy & Training DM Service Definition DM Expertise & Consultancy & Training EINFRA --------------- Service Provisioning
  • 10.
  • 11.
    EUDAT Recipe Building Trustbetween RIs & E-INFRAs Co-design of services Agree roles and responsibilities Bringing e-Infrastructures together Choosing cooperation rather than competition Focus on core business Leveraging national resources EU grants and initiatives need to support the national systems
  • 12.
    RDA Uptake byEUDAT Data Foundation and Terminology (DFT): The DTF WG has built up a common version of some important terminology such as “digital object” and term relations. These results help us to ensure that we are using the same terminology as our research communities – many of whom have also been involved in these efforts. Data Type Registry (DTR): The DTR WG provided a schema and a prototype federated registry system based on that schema. The schema can be used to formally describe the content data types in sufficient detail to make them actionable automatically. PID Information Types (PIT): The PIT WG provided a protocol and an approach to harmonize the way in which we label the data types associated with a PID, for instance, as a “checksum” or as an “author” field. Practical Policy (PP): The PP WG produced a cookbook of best-practice actionable data policies. Metadata standards directory: This group transformed the DCC metadata catalogue to make use of a community maintenance model. This helps ensure that the catalogue of metadata standards that are in use stays up to date. DSA/WDS Certification of digital repositories: This group merged two of the data repository certification schemes, namely the Data Seal of Approval (DSA) and the World Data Systems (WDS) certification. This increases the momentum of the certification schemes and also ensures that the certification as taken up by many EUDAT partners is more relevant.
  • 13.
    The Complete Picture? DMService Definition DM Expertise & Consultancy & Training DM Service Provisioning
  • 14.
    The Complete Picture? DMService Provisioning DM Service Definition DM Expertise & Consultancy & Training
  • 15.
    EUDAT Data Domainmodeled on the ANDS1 Data Curation Continiuum 1. Australian National Data Service organization – www.ands.org.au CDI Data Domain 1515
  • 16.
    Leveraging Partner NodeServices Individual EUDAT nodes contribute local services to the CDI service portfolio including expertise & guidance Relevant services for the sensitive data domain are: ePouta from CSC: secure cloud computing environment TSD from University of Oslo: secure platform to collect, store and analyse sensitive data Some partners start addressing health & medical administrative data

Editor's Notes

  • #3 WHAT IS EUDAT? - A UNIQUE INITIATIVE OF THE MAJOR EUROPEAN DATA & COMPUTING CENTERS - TOGETHER WITH A LARGE VARIETY OF RESEARCH COMMUNITY ORGANIZATIONS INCLUDING MANY THEMATIC DATA CENTERS THAT BUILD and OPERATE AN INFRASTRUCTURE FOR DELIVERING GENERAL DM SERVICES AT A PAN-EU LEVEL THOSE CENTERS ARE ALREADY POVIDING DM SERVICES TO NATIONAL COMMUNITIES. BUT THESE ARE THEMSELVES USUALLY PART OF WIDER NETWORKS AND NEED BE SERVED BY PAN-EUROPEAN SOLUTIONS. THAT IS WHAT MAKES THE DATA CENTERS COME TOGETHER IN EUDAT TO PROVIDE A FEDERATED LAYER OF SERVICES The major EU research compute & data centers and general IT service providers and a broad variety of community RI organizations EUDAT2020 is a project of some 20ME with 37 consortium partners
  • #4  INITIATIVE STARTED IN 2011 “RIDING THE WAVE” REPORT from 2010 WAS OF COURSE VERY INFLUENTIAL ALSO FOR EUDAT TWO EC PROJECTS SUPPORTING EUDAT THE SECOND ONE EUDAT2020 HAS NOW 37 PARTNERS AND A BUDGET OF 20ME Pewi’s email example Sustainable is bridge to projects and to the CDI PA Federation in its ‘political’ meaning. Centers that have entered into an agreement doing things together
  • #5 TO CREATE AND OPERATE THE EUDAT INFRASTRUCTURE EUDAT LEVERAGES SOMETHING CALLED THE CDI THE CDI IS THE VEHICLE WE USE TO PROVIDE THE EUDAT SERVICES WITH THE CDI WE CREATE THE ENVIRONMENT THAT ALLOWS US TO DELIVER OUR DM SERVICES Service providers are the main compute & data centers in EUROPE Governed by the CDI council A collaboration between Service Providers and Research Communities A Partnership Agreement specifying the mutual obligations between the EUDAT centres a portfolio of data management services A data and service model that ensures the CDI’s interoperability, extensibility and stability Looking to the future we have the CDI partnership agreement and a secretariat as a basis for future coordination. Consultancy and training broad-based critical mass of data knowledge long reach, excellent customer channel Network of best practice, co-design, technology exchange sharing of ideas, technologies, solutions across partners
  • #6 B2DROP a DropBox type of collaboration solution B2SHARE as a archiving/publication platform B2FIND as an interdisciplinary metadata catalogue Each of these not killer services, but the integration between them does give considerable added value. Make the integration statement
  • #7 OUR COMMUNITIES: CORE COMMUNITIES PARTNERS IN THE PROJECT, THEY ARE RESEARCH INFRASTRUCTURES, AND WE WORK WITH THEM TO INTEGRATE EUDAT SERVICES IN THEIR WORKFLOW OR CREATE NEW SERVICES AS THE DATA SUBSCRIPTION & DISTRIBUTION SERVICE WE WORK ON WITH ELIXIR partners in the project and whose services uptake is part of the project, long term commitment Need them to deliver requirements, and test your services, use your services, they should the purpose in the life of the service providers
  • #8 BEYOND THE CORE CUMMUNITIES WE ENGAGE WITH A LARGE GROUP OF RESEARCH COMMUNITIES AND PROJECTS THROUGH DATA PILOTS WHERE WE WORK ON SPECIFIC SOLUTIONS FOR THEIR DM REQUIREMENTS WE ALSO HAVE THE INSTRUMENT OF EUDAT WORKING GROUPS WHERE EUDAT EXPERTS DISCUSS AND WORK WITH COMMUNITY EXPERTS ALSO OUTSIDE THE EUDAT PROJECT ABOUT SPECIFIC DM SUBJECTS: SEMANTICS< ARRAY DB AND RECENTLY WE STARTED A WG ON SENSITIVE DATA Refer to the booklet Mention the topics of the WGs: Semantics, Array Database Technologies, Sensitive Data: newly started investigate & meet requirements from Data Pilots and partners Scope: BioMed, Social Sciences, Life Sciences
  • #12 SUMMARIZING WHAT ARE UNIQUE FEATURES OF EUDAT: CLOSE COLLABORATION WITH THE COMMUNITIES: CO-DESIGN OF SERVICES AND SEPARATION OF ROLES AND RESPONSIBILITIES WE CAN COLLABORATE WITH OUR COLLEAGUE INFRASTRUCTURES BECAUSE WE KNOW OUR CORE BUSINESS WE LEVERAGE THE RESOURCES AND INVESTMENTS IN THE NATIONAL SYSTEMS, WE JUST WANT TO MOVE THESE TO A PAN-EU LEVEL i.e. projects need to come from real needs at national levels to be sustained. NO PARALEL STRUCTURE But here there are some recipes from EUDAT that can be applied: Building trust between Ris & EINFRAS Bringing e-Infrastructure togethers (EUDAT trying to do that throught various ways, including in the context of the EOSC) Building services that are embedded nationally (not just to serve European homeless scientists!)
  • #17 MADE AVAILABLE TO CDI USERS BUT OF COURSE AT A COST PRICE DEPENDING ON THE RELATION WITH THE CUSTOMER some partners that are starting to address health & medical administrative data, e.g. in collaboration with hospitals (this is the case at CSC). TSD provides a platform for researchers working at UiO and in other public research institutions (UH-sector, university hospitals etc.) to collect, store and analyze sensitive research data. ePouta is a Finnish cloud computing environment (Infrastructure as a Service, IaaS) designed for processing sensitive data. It is a closed environment that meets elevated information security level regulations. It is suitable for all fields of science, and also for government and research-sector organisations. The cloud service combines virtual computational resources with the customers' own resources using a dedicated light path. The service is easily scalable to customers' requirements. https://video.nordu.net/media/Web+security+and+Secure+Research+-+NDN16+-+Track3+D3+0900/0_qd6pe7on/44520 59minutes in the vid Such services