Sharing big data
15 June 2017
Bob Jones
CERN
Bob.Jones <at> cern.ch
Helix Nebula – The Science Cloud
Helix Nebula – The Science Cloud with Grant Agreement 687614 is a Pre-Commercial Procurement Action
funded by H2020 Framework Programme
Accelerating Science and Innovation
Data in High-Energy Physics
Based on DPHEP Study Group (2009). Data Preservation in High Energy Physics. http://arxiv.org/abs/0912.0255
Patricia Herterich
5EPFL & SDSC visit 2017-03-24
CERN Open Data Portal
• 2015
• 40 TB of 2010 data
• 2016
• 320 TB of 2011 data
• Curation, release of
• Simulated data (MC)
• Trigger information
• Configuration files
http://github.com/cernopendata
Barend Mons, Leiden University Medical Center
In the FAIR Data approach, data should be:
• Findable – Easy to find by both humans and computer systems and
based on mandatory description of the metadata that allow the
discovery of interesting datasets
• Accessible – Stored for long term such that they can be easily
accessed and/or downloaded with well-defined license and access
conditions (Open Access when possible), whether at the level of
metadata, or at the level of the actual data content
• Interoperable – Ready to be combined with other datasets by
humans as well as computer systems
• Reusable – Ready to be used for future research and to be processed
further using computational methods.
https://www.dtls.nl/fair-data/
Peter Doorn, Director DANShttps://www.force11.org/group/fairgroup/fairprinciples
27/06/2017
The Hybrid Cloud Model
Brings together
• research organisations,
• data providers,
• publicly funded e-
infrastructures,
• commercial cloud service
providers
In a hybrid cloud with
procurement and governance
approaches suitable for the
dynamic cloud market In-house
27/06/2017
Data Commons is a Platform that fosters development of a digital Ecosystem
Treats products of research – data, software, methods, papers, training
materials etc. as a digital asset (object)
Digital objects need to conform to FAIR principles
- Findable, Accessible, Interoperable, Reproducible
Digital objects exist in a shared virtual space (initial)
- Find, Deposit, Manage, Share and Reuse: digital assets
Enables interactions between Producers and Consumers of digital assets
Gives currency to digital assets and the people who develop and support
them
Philip E. Bourne, Ph.D. FACMI
Associate Director for Data Science
National Institutes of Health, USA
Data Commons Pilot – connecting the pieces
Co-location of large and/or highly
utilized NIH funded data on the cloud
+ commonly used tools for analyzing
and sharing digital objects
to create an interoperable resource for
the research community.
Investigators will be able to collaborate
and share digital objects within this
environment and connect with others
Impact
Biggest issuer of DOIs for software in the world
Reference material for publications
F1000, Wiley, eLife, PLoS, Elsevier, Nature, etc
Recommended by EC and National programmes
https://www.zenodo.org/
Summary
Sharing big data needs technology, processes
& organisation, people
FAIR principles represent best practice
Findable, Accessible, Interoperable, Reusable
Research communities around the world are
developing science commons to accelerate
the sharing of digital assets
27/06/2017

Sharing Big Data - Bob Jones

  • 2.
    Sharing big data 15June 2017 Bob Jones CERN Bob.Jones <at> cern.ch Helix Nebula – The Science Cloud Helix Nebula – The Science Cloud with Grant Agreement 687614 is a Pre-Commercial Procurement Action funded by H2020 Framework Programme
  • 3.
  • 4.
    Data in High-EnergyPhysics Based on DPHEP Study Group (2009). Data Preservation in High Energy Physics. http://arxiv.org/abs/0912.0255 Patricia Herterich
  • 5.
    5EPFL & SDSCvisit 2017-03-24 CERN Open Data Portal • 2015 • 40 TB of 2010 data • 2016 • 320 TB of 2011 data • Curation, release of • Simulated data (MC) • Trigger information • Configuration files http://github.com/cernopendata
  • 6.
    Barend Mons, LeidenUniversity Medical Center
  • 7.
    In the FAIRData approach, data should be: • Findable – Easy to find by both humans and computer systems and based on mandatory description of the metadata that allow the discovery of interesting datasets • Accessible – Stored for long term such that they can be easily accessed and/or downloaded with well-defined license and access conditions (Open Access when possible), whether at the level of metadata, or at the level of the actual data content • Interoperable – Ready to be combined with other datasets by humans as well as computer systems • Reusable – Ready to be used for future research and to be processed further using computational methods. https://www.dtls.nl/fair-data/ Peter Doorn, Director DANShttps://www.force11.org/group/fairgroup/fairprinciples
  • 8.
  • 9.
    The Hybrid CloudModel Brings together • research organisations, • data providers, • publicly funded e- infrastructures, • commercial cloud service providers In a hybrid cloud with procurement and governance approaches suitable for the dynamic cloud market In-house 27/06/2017
  • 10.
    Data Commons isa Platform that fosters development of a digital Ecosystem Treats products of research – data, software, methods, papers, training materials etc. as a digital asset (object) Digital objects need to conform to FAIR principles - Findable, Accessible, Interoperable, Reproducible Digital objects exist in a shared virtual space (initial) - Find, Deposit, Manage, Share and Reuse: digital assets Enables interactions between Producers and Consumers of digital assets Gives currency to digital assets and the people who develop and support them Philip E. Bourne, Ph.D. FACMI Associate Director for Data Science National Institutes of Health, USA
  • 11.
    Data Commons Pilot– connecting the pieces Co-location of large and/or highly utilized NIH funded data on the cloud + commonly used tools for analyzing and sharing digital objects to create an interoperable resource for the research community. Investigators will be able to collaborate and share digital objects within this environment and connect with others
  • 12.
    Impact Biggest issuer ofDOIs for software in the world Reference material for publications F1000, Wiley, eLife, PLoS, Elsevier, Nature, etc Recommended by EC and National programmes https://www.zenodo.org/
  • 13.
    Summary Sharing big dataneeds technology, processes & organisation, people FAIR principles represent best practice Findable, Accessible, Interoperable, Reusable Research communities around the world are developing science commons to accelerate the sharing of digital assets 27/06/2017

Editor's Notes