EUDATEUDAT and Big Data in Science       Wolfgang Gentzsch, Advisor, EUDAT     HPCC 2013 Newport RI, 26-28 March 2013
Data trends                     ZettabytesExponential growth                     Exabytes                     Petabytes   ...
The EUDAT Case If there are hundreds of Research Infrastructures, howmany different data management systems can we sustain...
Collaborative Data Infrastructure                 -A framework for the future? -                                          ...
5
Data Centers and Communities                               6
Five research communities on Board•   EPOS: European Plate Observatory System•   CLARIN: Common Language Resources and Tec...
8
9
10
11
12
13
Communities ↔ Data Centers
Building Blocks of the CDI                      EUDAT Portal                      Integrated APIs and harmonized access to...
SAFE_REPLICATION@EUDATAllow communities to replicatedata to selected data centersfor storage and do this in arobust, relia...
DATA_STAGING@EUDATAllow the communities todynamically replicate a subsetof their data stored in EUDATto an HPC workspace i...
METADATA@EUDATCreate a joint metadatadomain for all data stored byEUDAT data centers and acatalogue which exposes thedata ...
SIMPLE_STORE@EUDATCreate an easy to use service thatwill help researchers mediated bythe participating communities touploa...
Persistent_Identifyers@EUDATDeploy a robust, highlyavailable and effective PIDservice that can be used withinthe communiti...
AAI@EUDATProvide a solution for a workingAAI system in a federatedscenario.Design the AA infrastructure tobe used during t...
OPERATION TEAM                 22
Work plan for the next months• Moving the services to a production environment• Capturing additional requirements• Integra...
Welcome to the 2nd EUDAT Conference!                   28-30 October 2013, Rome                   •International event wit...
Upcoming SlideShare
Loading in …5
×

Eudat and Big Data in Science

480 views

Published on

In this video from the 2013 National HPCC Conference, Wolfgang Gentzsch presents: EUDAT and Big Data in Science.

Big data science emerges as a new paradigm for scientific discovery that reflects the increasing value of observational, experimental and computer-generated data in virtually all domains, from physics to the humanities and social sciences. Addressing this new paradigm, the EUDAT project is a European data initiative that brings together a unique consortium of 25 partners — including research communities, national data and high performance computing (HPC) centers, technology providers, and funding agencies — from 13 countries. EUDAT aims to build a sustainable cross-disciplinary and cross-national data infrastructure that provides a set of shared services for accessing and preserving research data. The design and deployment of these services is being coordinated by multi-disciplinary task forces comprising representatives from research communities and data centers.”

You can watch the presentation with audio at insideHPC: http://insidehpc.com/2013/03/27/video-eudat-and-big-data-in-science/

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
480
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Eudat and Big Data in Science

  1. 1. EUDATEUDAT and Big Data in Science Wolfgang Gentzsch, Advisor, EUDAT HPCC 2013 Newport RI, 26-28 March 2013
  2. 2. Data trends ZettabytesExponential growth Exabytes Petabytes • Where to store it? Terabytes Gigabytes • How to find it? Increasing complexity and variety • How to make the most of it? • How to ensure interoperability? 2
  3. 3. The EUDAT Case If there are hundreds of Research Infrastructures, howmany different data management systems can we sustain? 3
  4. 4. Collaborative Data Infrastructure -A framework for the future? - User functionalities, data capture Data & transfer, virtual research Users Generators environmentsData Curation Data discovery & navigation, Trust workflow generation, annotation, Community Support Services interpretability Persistent storage, identification, authenticity, workflow execution, Common Data Services mining
  5. 5. 5
  6. 6. Data Centers and Communities 6
  7. 7. Five research communities on Board• EPOS: European Plate Observatory System• CLARIN: Common Language Resources and Technology Infrastructure• ENES: Service for Climate Modelling in Europe• LifeWatch: Biodiversity Data and Observatories• VPH: The Virtual Physiological Human• All share common challenges: – Reference models and architectures – Persistent data identifiers – Metadata management – Distributed data sources – Data interoperability 7
  8. 8. 8
  9. 9. 9
  10. 10. 10
  11. 11. 11
  12. 12. 12
  13. 13. 13
  14. 14. Communities ↔ Data Centers
  15. 15. Building Blocks of the CDI EUDAT Portal Integrated APIs and harmonized access to EUDAT facilitiesMetadata Catalogue AAIAggregated EUDAT metadata domain. Network of trustData inventory among authentication andData Staging Safe Replication Simple Store authorization actorsDynamic replication Data curation and Researcher datato HPC workspace access optimization store (simplefor processing upload, share and access)
  16. 16. SAFE_REPLICATION@EUDATAllow communities to replicatedata to selected data centersfor storage and do this in arobust, reliable and highlyavailable manner.Improve data curation andaccessibility.More info: eudat-safereplication@postit.csc.fi 16
  17. 17. DATA_STAGING@EUDATAllow the communities todynamically replicate a subsetof their data stored in EUDATto an HPC workspace in orderto be processed.More info: eudat-datastaging@postit.csc.fi 17
  18. 18. METADATA@EUDATCreate a joint metadatadomain for all data stored byEUDAT data centers and acatalogue which exposes thedata stored within EUDAT,allowing data searches.The EUDAT repository shouldprovide an inventory ofmetadata from differentcommunities More info: eudat-metadata@postit.csc.fi 18
  19. 19. SIMPLE_STORE@EUDATCreate an easy to use service thatwill help researchers mediated bythe participating communities toupload and store data which is notpart of the officially handled datasets of the community.This service will address the longtail of “small” data and theresearchers/citizen scientistscreating/manipulating them.More info: eudat-simplestore@postit.csc.fi 19
  20. 20. Persistent_Identifyers@EUDATDeploy a robust, highlyavailable and effective PIDservice that can be used withinthe communities and byEUDAT.Keeping track of the “names”of data sets deposited withthe CDI requires robustmechanisms.More info: eudat-persistentidentifiers@postit.csc.fi 20
  21. 21. AAI@EUDATProvide a solution for a workingAAI system in a federatedscenario.Design the AA infrastructure tobe used during the EUDATproject and beyond.More info: eudat-AAI@postit.csc.fi 21
  22. 22. OPERATION TEAM 22
  23. 23. Work plan for the next months• Moving the services to a production environment• Capturing additional requirements• Integrating new partners to EUDAT (in particular research communities) – Working groups, pilots, observers and associate partners• Collaborating with other initiatives – European e-Infrastructures: EGI, PRACE, DANTE, HELIX NEBULA, SCIDIPS-ES, etc. – Global initiatives: RDA, CODATA, etc• Defining EUDAT’s path to sustainability – Cost and funding models – Governance 23
  24. 24. Welcome to the 2nd EUDAT Conference! 28-30 October 2013, Rome •International event with keynotes from Europe and US • A forum to discuss the future of data infrastructures • Project presentations and poster sessions • Training tutorials 24

×