www.d4science.org
D4SCIENCE DATA INFRASTRUCTURE
Facilitator for a FAIR data management
Pasquale Pagano
CNR – ISTI
(Pisa, Italy)
www.d4science.org
Outline
Context
Requirements
Virtual Research Environments
Dealing with complexity
FAIR principles
Conclusions
D4Science: Facilitator for a FAIR data management 1
www.d4science.org
D4Science is an hybrid data infrastructure
technologies integrated to provide
elastic access and usage of data and data-management capabilities
D4Science: Facilitator for a FAIR data management 2
• +55 VREs hosted
• +2500 scientists in 44 countries
• +50 data providers
• +25,000 derivative data/month
• over a billion quality records
• +20,000 temporal datasets
• +50,000 spatial datasets
• 99.7% service availability
Humanities and Cultural Heritage
Social Mining
Environmental Studies
Biological and Ecological Studies
www.d4science.org
are multidisciplinary, involve members belonging to diverse organisations
cannot rely on costly environments managed by dedicated organizations
require to access data and services that are spread among many providers
Communities’ needs
D4Science: Facilitator for a FAIR data management 3
cost and time required to implement this approach
largely exceed the available capacities
Not individual researchers but group of researchers
dynamically aggregated to address research questions/problems
build and operate their own supporting environments
wish to effectively inject open science in daily tasks
www.d4science.org
Requirements for IT systems
Support collaborative research and experimentation
Implement Reproducibility-Repeatability-Reusability
Allow sharing data and findings
Grant open access to produced scientific knowledge and data
Tackle simplified access to existing computing and storage resources
Ensure low operational and maintenance costs
Manage heterogeneous data access policies
D4Science: Facilitator for a FAIR data management 4
www.d4science.org
Virtual Research Environment
An operational environment
Where set of resources (data,
services, computational, and
storage resources)
are assigned to group of users via
interfaces
for a limited timeframe
L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a Research Agenda. Data Science Journal, Vol. 12
Created on demand
Regulated by tailored policies
No cost for the resource
providers
Open to host and operate
custom software
D4Science: Facilitator for a FAIR data management 5
www.d4science.org
D4Science Geospatial Interpolation
In situ observations from
Copernicus Marine
Environment Monitoring
Service
Interpolation service
SeaDataNet Data-
Interpolating Variational
Analysis service (DIVA)
Estimates global, uniform
distributions of
environmental parameters
from scattered observations
Exploit the global estimate
and run niche modelling to
calculate a species
distribution
www.d4science.org
WPS
REST
Geospatial data infra.
Work-
-space
WMS
WCS
GeoTiff
NetCDF
OPeNDAP
VRE
Data preparation
+
Comp. parameters
NetCDF file
Provenance Metadata
(Prov-O)
Out. file
Sharing
Input
User
Other user
OGC StandardsVisualisation
Publication
VRE
The SeaDataNet-D4Science Connector
Architecture
www.d4science.org
•I1. (meta)data use a formal,
accessible, shared, and broadly
applicable language for knowledge
representation
•I2. (meta)data use vocabularies that
follow FAIR principles
•I3. (meta)data include qualified
references to other (meta)data.
•R1. meta(data) have a plurality of
accurate and relevant attributes
•R1.1. (meta)data are released with a
clear and accessible data usage license.
•R1.2. (meta)data are associated with
their provenance.
•R1.3. (meta)data meet domain-relevant
community standards.
•A1 retrievable by their identifier
using a standardized protocol
•A1.1 the protocol is open, free, and
universally implementable
•A1.2 the protocol allows for an
authentication and authorization
procedure
•A2 metadata are accessible, even
when the data are no longer
available.
•F1. globally unique and eternally
persistent identifier
•F2. rich metadata
•F3. indexed in a searchable resource
•F4. metadata specify the data
identifier
Findable Accessible
InteroperableRe-usable
D4Science: Facilitator for a FAIR data management 8
www.d4science.org
D4Science: Findability
Findability is enabled
• By extending the concept of resources to datasets,
methods/algorithms, research objects, and services
• by assigning to each of the D4Science managed resources
• a unique identifier
• rich and extensible metadata (including attribution, provenance
and licence information)
• by publishing resources in tailored and global catalogues that
supports keyword, faceted and temporal/geospatial discovery
D4Science: Facilitator for a FAIR data management 9
www.d4science.org
D4Science: Accessibility
Accessibility is obtained
• by making shared and published resources available through
multiple protocols in order to maximise the set of potential
exploitation cases
• by providing also for transparent Authentication and
Authorization, whenever the published resource requires it
• by enabling policies enforcement
D4Science: Facilitator for a FAIR data management 10
www.d4science.org
D4Science: Interoperability
Interoperability is facilitated
• by enriching automatically the resources with metadata in
multiple formats
• including ISO 19115, Darwin Core, Dublin Core, DCAT and
application profiles
• by promoting exploitation of ontologies and controlled
vocabularies
D4Science: Facilitator for a FAIR data management 11
www.d4science.org
D4Science: Reusability
Reusability is promoted
• by systematically endowing shared and published resources with
• a clear licence governing their use/re-use
• citation and attribution statements
• by systematically generating provenance metadata
• by design allowing the execution of the experiment in the same
technical and contextual environment
D4Science: Facilitator for a FAIR data management 12
www.d4science.org
D4Science enacts FAIR because …
 Embrace as-a-Service approach
 Exploit communication standards
 Hide complexity of computational capabilities
 Enable Access via VRE governed by tailored policies
 Facilitate provenance and attribution management
 Implement economy-of-scale and costs reduction
 Promote collaboration and sharing
 Enable Re-usability
www.d4science.org
THANK YOU
Contact Points
pasquale.pagano@isti.cnr.it
www.d4science.org
info@d4science.org

D4Science Data Infrastructure - Facilitator for a FAIR Data Management

  • 1.
    www.d4science.org D4SCIENCE DATA INFRASTRUCTURE Facilitatorfor a FAIR data management Pasquale Pagano CNR – ISTI (Pisa, Italy)
  • 2.
    www.d4science.org Outline Context Requirements Virtual Research Environments Dealingwith complexity FAIR principles Conclusions D4Science: Facilitator for a FAIR data management 1
  • 3.
    www.d4science.org D4Science is anhybrid data infrastructure technologies integrated to provide elastic access and usage of data and data-management capabilities D4Science: Facilitator for a FAIR data management 2 • +55 VREs hosted • +2500 scientists in 44 countries • +50 data providers • +25,000 derivative data/month • over a billion quality records • +20,000 temporal datasets • +50,000 spatial datasets • 99.7% service availability Humanities and Cultural Heritage Social Mining Environmental Studies Biological and Ecological Studies
  • 4.
    www.d4science.org are multidisciplinary, involvemembers belonging to diverse organisations cannot rely on costly environments managed by dedicated organizations require to access data and services that are spread among many providers Communities’ needs D4Science: Facilitator for a FAIR data management 3 cost and time required to implement this approach largely exceed the available capacities Not individual researchers but group of researchers dynamically aggregated to address research questions/problems build and operate their own supporting environments wish to effectively inject open science in daily tasks
  • 5.
    www.d4science.org Requirements for ITsystems Support collaborative research and experimentation Implement Reproducibility-Repeatability-Reusability Allow sharing data and findings Grant open access to produced scientific knowledge and data Tackle simplified access to existing computing and storage resources Ensure low operational and maintenance costs Manage heterogeneous data access policies D4Science: Facilitator for a FAIR data management 4
  • 6.
    www.d4science.org Virtual Research Environment Anoperational environment Where set of resources (data, services, computational, and storage resources) are assigned to group of users via interfaces for a limited timeframe L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a Research Agenda. Data Science Journal, Vol. 12 Created on demand Regulated by tailored policies No cost for the resource providers Open to host and operate custom software D4Science: Facilitator for a FAIR data management 5
  • 7.
    www.d4science.org D4Science Geospatial Interpolation Insitu observations from Copernicus Marine Environment Monitoring Service Interpolation service SeaDataNet Data- Interpolating Variational Analysis service (DIVA) Estimates global, uniform distributions of environmental parameters from scattered observations Exploit the global estimate and run niche modelling to calculate a species distribution
  • 8.
    www.d4science.org WPS REST Geospatial data infra. Work- -space WMS WCS GeoTiff NetCDF OPeNDAP VRE Datapreparation + Comp. parameters NetCDF file Provenance Metadata (Prov-O) Out. file Sharing Input User Other user OGC StandardsVisualisation Publication VRE The SeaDataNet-D4Science Connector Architecture
  • 9.
    www.d4science.org •I1. (meta)data usea formal, accessible, shared, and broadly applicable language for knowledge representation •I2. (meta)data use vocabularies that follow FAIR principles •I3. (meta)data include qualified references to other (meta)data. •R1. meta(data) have a plurality of accurate and relevant attributes •R1.1. (meta)data are released with a clear and accessible data usage license. •R1.2. (meta)data are associated with their provenance. •R1.3. (meta)data meet domain-relevant community standards. •A1 retrievable by their identifier using a standardized protocol •A1.1 the protocol is open, free, and universally implementable •A1.2 the protocol allows for an authentication and authorization procedure •A2 metadata are accessible, even when the data are no longer available. •F1. globally unique and eternally persistent identifier •F2. rich metadata •F3. indexed in a searchable resource •F4. metadata specify the data identifier Findable Accessible InteroperableRe-usable D4Science: Facilitator for a FAIR data management 8
  • 10.
    www.d4science.org D4Science: Findability Findability isenabled • By extending the concept of resources to datasets, methods/algorithms, research objects, and services • by assigning to each of the D4Science managed resources • a unique identifier • rich and extensible metadata (including attribution, provenance and licence information) • by publishing resources in tailored and global catalogues that supports keyword, faceted and temporal/geospatial discovery D4Science: Facilitator for a FAIR data management 9
  • 11.
    www.d4science.org D4Science: Accessibility Accessibility isobtained • by making shared and published resources available through multiple protocols in order to maximise the set of potential exploitation cases • by providing also for transparent Authentication and Authorization, whenever the published resource requires it • by enabling policies enforcement D4Science: Facilitator for a FAIR data management 10
  • 12.
    www.d4science.org D4Science: Interoperability Interoperability isfacilitated • by enriching automatically the resources with metadata in multiple formats • including ISO 19115, Darwin Core, Dublin Core, DCAT and application profiles • by promoting exploitation of ontologies and controlled vocabularies D4Science: Facilitator for a FAIR data management 11
  • 13.
    www.d4science.org D4Science: Reusability Reusability ispromoted • by systematically endowing shared and published resources with • a clear licence governing their use/re-use • citation and attribution statements • by systematically generating provenance metadata • by design allowing the execution of the experiment in the same technical and contextual environment D4Science: Facilitator for a FAIR data management 12
  • 14.
    www.d4science.org D4Science enacts FAIRbecause …  Embrace as-a-Service approach  Exploit communication standards  Hide complexity of computational capabilities  Enable Access via VRE governed by tailored policies  Facilitate provenance and attribution management  Implement economy-of-scale and costs reduction  Promote collaboration and sharing  Enable Re-usability
  • 15.