D4Science Data Infrastructure - Facilitator for a FAIR Data Management

www.d4science.org
D4SCIENCE DATA INFRASTRUCTURE
Facilitator for a FAIR data management
Pasquale Pagano
CNR – ISTI
(Pisa, Italy)

www.d4science.org
Outline
Context
Requirements
Virtual Research Environments
Dealing with complexity
FAIR principles
Conclusions
D4Science: Facilitator for a FAIR data management 1

www.d4science.org
D4Science is an hybrid data infrastructure
technologies integrated to provide
elastic access and usage of data and data-management capabilities
• +55 VREs hosted
• +2500 scientists in 44 countries
• +50 data providers
• +25,000 derivative data/month
• over a billion quality records
• +20,000 temporal datasets
• +50,000 spatial datasets
• 99.7% service availability
Humanities and Cultural Heritage
Social Mining
Environmental Studies
Biological and Ecological Studies

www.d4science.org
are multidisciplinary, involve members belonging to diverse organisations
cannot rely on costly environments managed by dedicated organizations
require to access data and services that are spread among many providers
Communities’ needs
cost and time required to implement this approach
largely exceed the available capacities
Not individual researchers but group of researchers
dynamically aggregated to address research questions/problems
build and operate their own supporting environments
wish to effectively inject open science in daily tasks

www.d4science.org
Requirements for IT systems
Support collaborative research and experimentation
Implement Reproducibility-Repeatability-Reusability
Allow sharing data and findings
Grant open access to produced scientific knowledge and data
Tackle simplified access to existing computing and storage resources
Ensure low operational and maintenance costs
Manage heterogeneous data access policies

www.d4science.org
Virtual Research Environment
An operational environment
Where set of resources (data,
services, computational, and
storage resources)
are assigned to group of users via
interfaces
for a limited timeframe
L. Candela, D. Castelli, P. Pagano (2013) Virtual Research Environments: An Overview and a Research Agenda. Data Science Journal, Vol. 12
Created on demand
Regulated by tailored policies
No cost for the resource
providers
Open to host and operate
custom software

www.d4science.org
D4Science Geospatial Interpolation
In situ observations from
Copernicus Marine
Environment Monitoring
Service
Interpolation service
SeaDataNet Data-
Interpolating Variational
Analysis service (DIVA)
Estimates global, uniform
distributions of
environmental parameters
from scattered observations
Exploit the global estimate
and run niche modelling to
calculate a species
distribution

www.d4science.org
WPS
REST
Geospatial data infra.
Work-
-space
WMS
WCS
GeoTiff
NetCDF
OPeNDAP
VRE
Data preparation
+
Comp. parameters
NetCDF file
Provenance Metadata
(Prov-O)
Out. file
Sharing
Input
User
Other user
OGC StandardsVisualisation
Publication
VRE
The SeaDataNet-D4Science Connector
Architecture

www.d4science.org
•I1. (meta)data use a formal,
accessible, shared, and broadly
applicable language for knowledge
representation
•I2. (meta)data use vocabularies that
follow FAIR principles
•I3. (meta)data include qualified
references to other (meta)data.
•R1. meta(data) have a plurality of
accurate and relevant attributes
•R1.1. (meta)data are released with a
clear and accessible data usage license.
•R1.2. (meta)data are associated with
their provenance.
•R1.3. (meta)data meet domain-relevant
community standards.
•A1 retrievable by their identifier
using a standardized protocol
•A1.1 the protocol is open, free, and
universally implementable
•A1.2 the protocol allows for an
authentication and authorization
procedure
•A2 metadata are accessible, even
when the data are no longer
available.
•F1. globally unique and eternally
persistent identifier
•F2. rich metadata
•F3. indexed in a searchable resource
•F4. metadata specify the data
identifier
Findable Accessible
InteroperableRe-usable

www.d4science.org
D4Science: Findability
Findability is enabled
• By extending the concept of resources to datasets,
methods/algorithms, research objects, and services
• by assigning to each of the D4Science managed resources
• a unique identifier
• rich and extensible metadata (including attribution, provenance
and licence information)
• by publishing resources in tailored and global catalogues that
supports keyword, faceted and temporal/geospatial discovery

www.d4science.org
D4Science: Accessibility
Accessibility is obtained
• by making shared and published resources available through
multiple protocols in order to maximise the set of potential
exploitation cases
• by providing also for transparent Authentication and
Authorization, whenever the published resource requires it
• by enabling policies enforcement

www.d4science.org
D4Science: Interoperability
Interoperability is facilitated
• by enriching automatically the resources with metadata in
multiple formats
• including ISO 19115, Darwin Core, Dublin Core, DCAT and
application profiles
• by promoting exploitation of ontologies and controlled
vocabularies

www.d4science.org
D4Science: Reusability
Reusability is promoted
• by systematically endowing shared and published resources with
• a clear licence governing their use/re-use
• citation and attribution statements
• by systematically generating provenance metadata
• by design allowing the execution of the experiment in the same
technical and contextual environment

www.d4science.org
D4Science enacts FAIR because …
 Embrace as-a-Service approach
 Exploit communication standards
 Hide complexity of computational capabilities
 Enable Access via VRE governed by tailored policies
 Facilitate provenance and attribution management
 Implement economy-of-scale and costs reduction
 Promote collaboration and sharing
 Enable Re-usability

www.d4science.org
THANK YOU
Contact Points
pasquale.pagano@isti.cnr.it
www.d4science.org
info@d4science.org

D4Science Data Infrastructure - Facilitator for a FAIR Data Management

More Related Content

What's hot

Similar to D4Science Data Infrastructure - Facilitator for a FAIR Data Management

More from Blue BRIDGE

Recently uploaded

D4Science Data Infrastructure - Facilitator for a FAIR Data Management