This document provides an overview of FAIR data principles and the FAIR data ecosystem. It discusses what FAIR data is, including that FAIR data aims to support communities in publishing and utilizing scientific data and knowledge in a findable, accessible, interoperable, and reusable manner. It then describes the different levels of the FAIR data ecosystem, including normative principles, standards in the FAIR data protocol, FAIR data resources that comply with these standards, and systems/tools that use FAIR data. It provides examples of converting raw data into FAIR data resources and the potential applications of a FAIR data ecosystem.
4. DATA STEWARDSHIP
Combination of all expertise to treat data well in a project:
■ Experiment design and data-design;
■ Re-use of existing data where possible;
■ Planning of the storage, networking and computing
infrastructure;
■ Data acquisition and processing;
■ Data publishing in a format that allows functional
interlinking of data(sets) as well as in a format suitable for
long-term preservation.
5. FAIR DATA STEWARDSHIP
Combination of all expertise to treat data well in a project:
■ Experiment design and data-design;
■ Re-use of existing data where possible;
■ Planning of the storage, networking and computing infrastructure;
■ Data acquisition and processing;
■ Data publishing in a format that allows functional interlinking of
data(sets) as well as in a format suitable for long-term preservation.
14. WHAT IS FAIR DATA?
FAIR Data aims to support existing communities in their
attempts to enable valuable scientific data and knowledge to
be published and utilised in a ‘FAIR’ manner.
Findable - (meta)data is uniquely and persistently identifiable.
Should have basic machine readable descriptive metadata.
Accessible - data is reachable and accessible by humans and
machines using standard formats and protocols.
Interoperable - (meta)data is machine readable and annotated
with resolvable vocabularies/ontologies.
Reusable - (meta)data is sufficiently well-described to allow
(semi)automated integration with other compatible data sources.
15. THE FAIR ECOSYSTEM
FAIR Data Principles
FAIR Data Protocol
FAIR Data Resources
FAIR Data Core Technologies
FAIR Data Systems/Tools
Normative
Artefact
Software
16. FAIR ECOSYSTEM - NORMATIVE LEVEL
FAIR Data Principles - general principles guiding FAIR data
solutions;
FAIR Data Protocol - complying with the FAIR Data
Principles, provide guidelines for implementing FAIR data
solutions, e.g., standards, APIs, technologies, …;
17. FAIR DATA PROTOCOL
Findable - standards for describing the dataset with the relevant
metadata;
Accessible - standards for represent and access the data
according to the defined usage license;
Interoperable - standards for machine readable descriptions of
the (meta)data and (semantic)annotation;
Reusable - standards for semantic annotation of the (meta)data
supporting machine reasoning, and standards for defining data
provenance and support citation;
The standards include technologies (e.g., RDF, nano pub, JSON,
OWL, etc.) as well as protocols and APIs.
18. FAIR ECOSYSTEM - ARTEFACT LEVEL
FAIR Data Resource - datasets expressed using one of
the prescribed standards of the FAIR Data Protocol and
with metadata complying with the protocol.
Annotation Ontology - reference conceptual model used
to provide semantics to elements of FAIR Data Resources
through annotation.
Controlled vocabularies, dictionaries, etc.
19. FAIR DATA RESOURCE
Datasets expressed using one of the prescribed standards of the
FAIR Data Protocol, with metadata complying with the protocol
and license. The original dataset is transformed into a FAIR format
and proper metadata and license are added to produce a FAIR
Data Resource. The original and the FAIR version can co-exist,
each one fulfilling its own purpose.
FAIR Conversion
FAIR Data Resource
23. SHARING DATA
I would like to exploit common genotype-
phenotype relations between Alzheimer’s
Disease and Huntington’s Disease…
I need to combine AD and HD data…
I can help
with that!
I can help
with that!
Source: Marcos Roos
25. SHARING LINKABLE DATA
Source: Marcos Roos
I can go straight to answering my questions
with data from multiple data owners!
Patients will be so pleased with this speed-up!
Here’s my
Linked Data,
have fun!
Here’s my
Linked Data,
have fun!
29. Raw data
(many formats)
Processed data
(primary storage format)
FAIR transformation
FAIR (meta)data
(RDF,XML etc.)
ProvenanceInitial transformation
30. Raw data
(many formats)
Processed data
(primary storage format)
FAIR transformation
FAIR (meta)data
(RDF,XML etc.)
ProvenanceInitial transformation
31. Raw data
(many formats)
FAIR download
(in local format)
Processed data
(primary storage format)
FAIR transformation
FAIR (meta)data
(RDF,XML etc.)
ProvenanceInitial transformation
32. Raw data
(many formats)
FAIR download
(in local format)
Processed data
(primary storage format)
FAIR transformation
FAIR (meta)data
(RDF,XML etc.)
High-Performance
Analysis
ProvenanceInitial transformation
Analysis transformation
35. BRING YOUR OWN DATA - BYOD
Goals:
■ Learn how to make data linkable “hands-on” with experts
■ Create a “telling story” to demonstrate its use
Composition:
■ Data owners – specialists on given datasets
■ Data interoperability experts
■ Domain experts
Source: Marcos Roos
41. A particular class of FAIR Data System that provides access to
published datasets. The datasets can be external or internal to the
FAIR Data Point. Also, the source data can be a regular (non-FAIR)
dataset or a FAIR Data Resource. If the source data is non-FAIR,
the FAIR Data Point needs to made the necessary FAIR
transformations on the fly.
44. A particular class of FAIR Data System to provide
support for data interoperability;
Supports publication and access to FAIR data.
Fosters an ecosystems of applications and services;
Federated architecture: different FAIRports (and other
FAIR Data Systems) are interconnectable;
Supports citations of datasets and data items;
Provides metrics for data usage and citation;
52. HOW TO REALISE
DTL
National FAIR
data engineering team
Elixir
Other
Project
FAIR Data IG
(P.I.s with
data)
Data SAC
G.M., B.M., M.G.
Data Core Team
FAIR Data Executive Team
L.B. (CTO), R.H., P.B., M.S., J.B., J.W.
engineers in the
FAIR Data virtual team
Core
FAIR
Technol
ogyFAIR Data V.T.
FAIR Data V.T.
FAIR Data V.T.
FAIR Data V.T.
FAIR Data V.T.
FAIR Data V.T.
Local
“DTLs”/
projects
project
s
From a process point-of-view, in our vision, data stewardship encompasses a planning phase, where you prepare the data environment followed by a management phase where the actual data creation and processing takes place and then, after your project finishes, you have to take care of the long-term preservation of the data.
From a sustainability point-of-view, the planning phase, depicted here as the data management plans, will be increasingly required by funding programs such as Horizon 2020. However, there is still a challenge to fund data preservation after the project since most of the time, projects funds cannot be used. And underlying these phases we should have the interoperability backbones, standards and procedures, such as the ones being developed/deployed in Excelerate,
The central point in the ecosystem is what we call a FAIR Data Resource, which is composed of a dataset in a FAIR format, its related metadata and license.
As I mentioned, BYODs are currently used to create FAIR Data (Resource). However, a significant part of a BYOD is spent on the FAIR Data Transformation.
Because of this we are working on a tool to automate this process as much as possible, the FAIRifier.
One of the tasks for both BYODs and the FAIRifier is to create a data model for what the original dataset should be transformed to, including proper semantic annotations of the types, etc. For common data structure types we are working on a FAIR Data Model Registry which stores the non-FAIR dataset model and the associated transformation model.
In this way, the FAIRifier can be integrated with the FAIR Data Model Registry to try to match an input dataset model with the ones stored in the registry. Once a match is found, the information about how to transform the dataset into a FAIR format is retrieved by the FAIRifier improving the process automation.