Making FAIR easy

MAKING FAIR EASY
OAI11 – Workshop on Innovations in Scholarly Communication – Geneva, June 19, 2019
Luiz Bonino
luiz.bonino@go-fair.org

FAIR PRINCIPLES
Findable:
F1. (meta)data are assigned a globally unique and persistent
identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier of the
data it describes;
F4. (meta)data are registered or indexed in a searchable
resource;
Accessible:
A1. (meta)data are retrievable by their identifier using a
standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no longer
available;
Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly
applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles;
I3. (meta)data include qualified references to other
(meta)data;
Reusable:
R1. (meta)data are richly described with a plurality of accurate and
relevant attributes;
R1.1. (meta)data are released with a clear and accessible data
usage license;
R1.2. (meta)data are associated with detailed provenance;
R1.3. (meta)data meet domain-relevant community
standards;
https://www.nature.com/articles/sdata201618

THE PRINCIPLES TARGET PRIMARILY MACHINES

FAIR DATA PRINCIPLES - METADATA
Findable:
F1. metadata are assigned a globally unique and persistent
identifier;
data it describes;
F4. metadata are registered or indexed in a searchable
resource;
Accessible:
A1. metadata are retrievable by their identifier using a
implementable;
available;
Interoperable:
I1. metadata use a formal, accessible, shared, and broadly
I2. metadata use vocabularies that follow FAIR principles;
I3. metadata include qualified references to other metadata;
Reusable:
R1. metadata are richly described with a plurality of accurate and
R1.1. metadata are released with a clear and accessible data
usage license;
R1.2. metadata are associated with detailed provenance;
R1.3. metadata meet domain-relevant community standards;

FAIR DATA PRINCIPLES – DATA/DIGITAL RESOURCES
Findable:
F1. data are assigned a globally unique and persistent
identifier;
data it describes;
F4. metadata are registered or indexed in a searchable
resource;
Accessible:
A1. metadata are retrievable by their identifier using a
implementable;
available;
Interoperable:
I1. metadata use a formal, accessible, shared, and broadly
I2. metadata use vocabularies that follow FAIR principles;
I3. metadata include qualified references to other (meta)data;
Reusable:
R1. metadata are richly described with a plurality of accurate and
R1.1. metadata are released with a clear and accessible data
usage license;
R1.2. metadata are associated with detailed provenance;
R1.3. metadata meet domain-relevant community standards;

FAIR DATA PRINCIPLES – SUPPORT INFRASTRUCTURE
Findable:
F1. (meta)data are assigned a globally unique and
persistent identifier;
F3. metadata clearly and explicitly include the identifier
of the data it describes;
resource;
Accessible:
implementable;
A2. metadata are accessible, even when the data are no
longer available;
Interoperable:
I1. (meta)data use a formal, accessible, shared, and
broadly applicable language for knowledge
representation.
I2. (meta)data use vocabularies that follow FAIR
principles;
(meta)data;
Reusable:
R1. (meta)data are richly described with a plurality of
accurate and relevant attributes;
R1.1. (meta)data are released with a clear and
accessible data usage license;
R1.2. (meta)data are associated with detailed
provenance;
standards;

DATA EXPERT EFFORT
Source: Data Science Report 2016, CrowdFlower, 2016: http://visit.crowdflower.com/rs/416-ZBE-142/images/CrowdFlower_DataScienceReport_2016.pdf

BREAK DOWN DATA EXPERT EFFORT PRINCIPLES
Findable:
identifier;
data it describes;
resource;
Accessible:
implementable;
available;
Interoperable:
(meta)data;
Reusable:
usage license;
standards;
19% of the time
60% of the time

COST OF NOT HAVING FAIR DATA
SOURCE: Study on the cost of not having FAIR research data
https://publications.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1

SOURCE: Study on the cost of not having FAIR research data
https://publications.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1
COST OF NOT HAVING FAIR DATA
€ 10,2B
with an expected economic benefit of up to € 20,1 by 2020

CURRENT FAIR ECOSYSTEM
Create PublishPlan Find
011001
1
110010
1
100110
0
BYOD FAIR Hackathon
Training
Annotate
https://github.com/FAIRDataTeam
Evaluate

FAIR ECOSYSTEM
Create PublishPlan Find
011001
1
110010
1
100110
0
Annotate Evaluate
Metadata
Management
Ontology
Modeling
Metadata
Registry
1 - https://github.com/nemo-ufes/ontouml-lightweight-editor/wiki
2 - https://github.com/MenthorTools
Ontology
Registry
Standard
Registry

FAIR ECOSYSTEM
Metadata
Template
Creator

FAIR ECOSYSTEM
Metadata
Registry
Metadata
Template
Creator

FAIR ECOSYSTEM
Ontology
Metadata
Template
Creator
Metadata
Registry

FAIR ECOSYSTEM
Ontology
Registry
Ontology
Metadata
Template
Creator
Metadata
Registry

FAIR ECOSYSTEM
Metadata
Template
Creator
Metadata
Editor
Ontology
Registry
Ontology
Metadata
Registry

FAIR ECOSYSTEM
Metadata
Template
Creator
Metadata
Editor
Ontology
Registry
Ontology
Metadata
Registry
Standard
Registry

FAIR ECOSYSTEM
Metadata
Template
Creator
FAIR Data
ResourceMetadata
Editor
Ontology
Registry
Ontology
Metadata
Registry
Standard
Registry

FAIR ECOSYSTEM
Metadata
Template
Creator
FAIR Data
ResourceMetadata
Editor
MetadataFAIR Data
Ontology
Registry
Ontology
Metadata
Registry
Standard
Registry

FAIR ECOSYSTEM
Metadata
Template
Creator
FAIR Data
ResourceMetadata
Editor
MetadataFAIR Data
Ontology
Registry
Ontology
Metadata
Registry
Standard
Registry
Repository

FAIR ECOSYSTEM
011001
1
110010
1
100110
0
Metadata
Template
Creator
FAIR Data
ResourceMetadata
Editor
MetadataFAIR Data
Ontology
Registry
Ontology
Metadata
Registry
Standard
Registry

DATA STEWARDSHIP WIZARD – HTTPS://DS-WIZARD.ORG
 Dynamic hierarchical questionnaire
 Customisable DS Knowledge Model
 Integration with the Data Stewardship for Open Science book
(FAIR) metrics evaluation
 Desirability of answers
 Integration with other resources (FAIRSharing, Bio.tools, …) –
upcoming
 Machine actionable DMPs - part of http://activedmps.org/

The FAIR Funder Components: https://ds-wizard.org
DATA STEWARDSHIP WIZARD

REPOSITORIES SUPPORTING USERS TO ACHIEVE FAIR
Findable:
identifier;
data it describes;
resource;
Accessible:
implementable;
available;
Interoperable:
(meta)data;
Reusable:
usage license;
standards;

• A registry of inter-linked (meta)data standards, repositories,
knowledgebases and policies
• A set of tools and services to discover and use these resources,
also connected to FAIRness assessment and evaluation tools
A flagship output of the: Core part of implementation networks in:
ERC EOSC Science Europe
Recommended by funders, e.g.:

• We guide consumers to discover, select and use standards,
repositories, knowledgebases and policies with confidence
• And producers to make their resources more visible, widely
adopted and cited
A flagship output of the: Core part of implementation networks in:
ERC EOSC Science Europe
Recommended by funders, e.g.:

Researchers in academia,
industry, government
Developers and curators of
resources
Journal publishers with data
policy
Research data facilitators,
librarians, trainers
Learned societies, unions
and associations
Funders and data policy
makers
https://doi.org/10.1038/s41587-019-0080-8
Open Access CC-BY
69 authors and users,
representing different stakeholder groups:

Examining FAIR
“Maturity”
Mark D Wilkinson
CBGP UPM-INIA, Madrid,
Spain
https://tinyurl.com/MeasuringFAIR
presenting the work of:
Mark Wilkinson, Erik Schultes, Susanna
Assunta Sansone, Ricardo de Miranda
Azevedo, Luiz Olavo Bonino da Silva Santos,
Philippe Rocca-Serra, Peter McQuilton,
Dominique Batista, Michel Dumontier

PREMISE
FAIR is more than a “fuzzy feeling”
“This necessitates machines to be capable of autonomously and
appropriately acting when faced with the wide range of types, formats,
and access-mechanisms/protocols that will be encountered during their
self-guided exploration of the global data ecosystem.”
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR

“FAIRness”
is
Measurable
...almost by
definition!
Creating a Web of data that can be
“intelligently” (re)used by Machines
necessitates certain behaviours by
data providers
These behaviours can be concretely described
Software can then be written that can utilize
these behaviours to find, access, and correctly
reuse the data

“FAIRness”
is
Measurable
...almost by
definition!
If that is true…
Software can also be written that examines a
given digital entity to see if it exhibits those
behaviours.
“FAIRness”
is
Measurable
...almost by
definition!

The
Founding
FAIR Metrics
Authoring
Team
...yes,
we were
self-
selecting!
Erik Schultes
Biomedical Researcher & Science Coordinator @ GFISCO
Luiz Olavo Bonino da Silva Santos
Semantic Interoperability & Technology Coordinator @ GFISCO
Mark Wilkinson, FBBVA Industry Chair, Biotechnology
Senior Researcher, Biological Informatics (--> Genetics)
Michel Dumontier
Distinguished Professor of Data Science (--> Biochemistry)
Susanna Assunta Sansone
Associate Professor, Engineering Science, U Oxford
Hon. Academic Editor, Scientific Data Journal
Peter Doorn
Director, Data Archiving and Networked Services, NL
Source:MarkD.Wilkinson-https://tinyurl.com/MeasuringFAIR

FAIR Metric
Authoring
Template

GEN 1 FAIR METRICS
After 3 face-to-face meetings
And over 6 months
We successfully created…

GEN 1 FAIR METRICS
After 3 face-to-face meetings
And over 6 months
We successfully created…
3 Metrics
(and we weren’t even entirely happy with those!)

+ + =
14
FAIR
Metrics

Public
Discussion
Ensued
Some Metrics generated a fair
bit of discussion!

Publication:
A Design Framework
for “Metrics”
June 2018
https://www.nature.com/articles/
sdata2018118

TYPICAL “GENERATION 1” METRIC
Example: FM-F1B, Identifier Persistence
Provide: URL to a document describing your persistence policy
Test: Does the URL resolve?

SEMI-AUTOMATED TEST VIA QUESTIONNAIRE PLUS “VALIDATION”
These kinds of Metrics lend themselves well to questionnaire-style interfaces
They raise awareness of what issues should be considered when creating FAIR resources
They can be (minimally) validated

(MANY) SUCH INTERFACES HAVE BEEN CREATED
https://purplepolarbear.nl
(this one uses our Generation 1 Metrics,
but a wide range of other questionnaires
have been created by other communities)

QUESTIONNAIRE INTERFACE (ONE EXAMPLE OF MANY)
https://purplepolarbear.nl
One advantage of
questionnaire-based
interfaces is that they can
be used to inform
(this text taken from “FAIR
Principles Explained” at GO
FAIR)

FAILURES OF OUR INITIAL APPROACH
 It was subjective - the most rigorous evaluations were manually executed, examining
the answers to the questionnaire based on “our opinion”
 The resulting scores were a range between 0 and 1 (represented by 0 to 5 ‘stars’),
but the selection of a score was also subjective.
 There was no way to interpret a partial score - what is the difference between a
score of .25 and a score of .75?

MOST IMPORTANTLY
Questionnaire-based approaches do not
evaluate the entire point of FAIRness

PREMISE
FAIR is more than a “fuzzy feeling”
“This necessitates machines to be capable of autonomously and
appropriately acting when faced with the wide range of types, formats,
and access-mechanisms/protocols that will be encountered during their self-
guided exploration of the global data ecosystem.”

GENERATION 2 FAIR MATURITY INDICATORS
 What do machines see?

OBJECTIVE: OBJECTIVITY IN FAIR EVALUATION
 Organizations want to KNOW if they are FAIR
 Organizations want ADVICE on what they can do better
 What is required for enhanced FAIR behaviours
 How difficult is it?
 How much will it cost?
 What expertise do I need?
 Objective evaluations (with narrative feedback on failures) provide these answers!

NON-TECHNICAL CHANGES IN GENERATION 2
 Change in name: “Maturity Indicators” sounds less judgemental than “Metrics”
 More emphasis on community-driven FAIR Evaluation
 Better documentation for how to contribute/participate

TECHNICAL FAILURES IN GENERATION 1 QUESTIONNAIRES
1. They don’t scale to the entire world!
2. They are time-consuming
3. They cannot be executed by “anyone” (only by the person who knows the resource
deeply)
4. They are (potentially) biased
5. They don’t adequately test one of the main point of FAIR, because a human is not
capable of evaluating this:
Can a machine find
and (re)use the data?

FIXING THE PROBLEMS
Generation 2 Maturity Indicators (MIs)
Approach to building Gen2 MIs:
 In-depth review of answers to original questionnaire (~11 resources)
 Record what providers are doing in-practice
 Note complaints from key data repositories v.v. “unfair” evaluations
 Build a “metadata harvesting” library that attempts to be
“exhaustive”
 That is, it pursues paths that I (personally) don’t consider to be “FAIR in-spirit”
 Trying not to be “prescriptive”!

METADATA HARVESTER

GEN2 MATURITY INDICATOR TESTS
 Are “standalone” Web interfaces that can be written/published by anyone
 Consume ONLY the GUID of the metadata
 they invoke The Harvester with that
 Their interface metadata is recorded using
 They are registered in the smartAPI registry for discovery (not required)
 They are registered in the Evaluator to be used by others (not required)
 They test the “block of metadata” for various features

GEN2 MATURITY INDICATOR TESTS
For example, the Generation 2 Maturity Indicator “data identifier in metadata” looks
for a hash key, or a LD property, from a list of widely-used properties that point at data,
including:
foaf:primaryTopic, dcat:distribution, ldp:contains, schema:mainEntity….

DOCS FOR GEN2_MI_F3 “DATA IDENTIFIER EXPLICITLY IN METADATA”
To locate the data identifier, hash data is tested for the keys:
● codeRepository
● mainEntity
● primaryTopic
● IAO:0000136 (is about)
● IAO_0000136
● SIO:000332 (is about)
● SIO_000332
● distribution
● contains
Graph data is tested for the properties:
● schema:codeRepository
● schema:mainEntity
● foaf:primaryTopic
● IAO:0000136 (information artifact ontology 'is about')
● SIO:000332 (SemanticScience Integrated Ontology 'is about')
● schema:distribution
● DCAT:distribution (Data Catalogue vocabulary)
● ldp:contains (Linked Data Platform)

RESULTS
Pass/Fail
● Gen2 Maturity Indicators return binary pass/fail
● They attempt to log everything they do, so that the output contains a record of why the test
passed/failed
○ The POINT of the test is to encourage incremental improvements of FAIRness
○ Detailed feedback is important not only for transparency, but to be informative and
supportive

DESIGNED FOR BOTTOM-UP COMMUNITY-DRIVEN EVALUATION
 Anyone can create a new Maturity Indicator, and submit it for open discussion
 Anyone can suggest edits to existing Maturity Indicators, if we’re not “fair”
https://github.com/FAIRMetrics/Metrics/blob/master/MaturityIndicators/README.md

(CURRENT) WORKFLOW FOR REGISTERING A NEW MI
 Community members write a human-readable Maturity Indicator following a
MarkDown template
 Pull request on GitHub
 The proposed Indicator is “scraped” by FAIRSharing, and registered as “under
consideration”
 A (yet to be defined) process, including discussion and advice from FAIR evaluation
experts, will ensue
 The Maturity Indicator will be approved
 FAIRSharing will update their record to show that this is an approved Maturity
Indicator

NEW INTERFACE
http://tinyurl.com/FAIR-Maturity-Tester

SUMMARY
The point of objective, community-driven evaluations is to empower community
governance organizations and agencies
THEY choose what is evaluated,
based on what
THEY care about!
THEY may also choose to check their resources against a more “core/global” standard
set of tests

THANK YOU!!!
Luiz Bonino
International Technology Coordinator – GO FAIR
Associate Professor BioSemantics – LUMC
E-mail: luiz.bonino@go-fair.org
Skype: luizolavobonino
Web: www.go-fair.org

Making FAIR easy

Recommended

Recommended

More Related Content

Similar to Making FAIR easy

Similar to Making FAIR easy (20)

More from Luiz Olavo Bonino da Silva Santos

More from Luiz Olavo Bonino da Silva Santos (7)

Recently uploaded

Recently uploaded (20)

Making FAIR easy