3. FAIR PRINCIPLES
Findable:
F1. (meta)data are assigned a globally unique and persistent
identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier of the
data it describes;
F4. (meta)data are registered or indexed in a searchable
resource;
Accessible:
A1. (meta)data are retrievable by their identifier using a
standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no longer
available;
Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly
applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles;
I3. (meta)data include qualified references to other
(meta)data;
Reusable:
R1. (meta)data are richly described with a plurality of accurate and
relevant attributes;
R1.1. (meta)data are released with a clear and accessible data
usage license;
R1.2. (meta)data are associated with detailed provenance;
R1.3. (meta)data meet domain-relevant community
standards;
https://www.nature.com/articles/sdata201618
6. FAIR PRINCIPLES
Findable:
F1. (meta)data are assigned a globally unique and persistent
identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier of the
data it describes;
F4. (meta)data are registered or indexed in a searchable
resource;
Accessible:
A1. (meta)data are retrievable by their identifier using a
standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no longer
available;
Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly
applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles;
I3. (meta)data include qualified references to other
(meta)data;
Reusable:
R1. (meta)data are richly described with a plurality of accurate and
relevant attributes;
R1.1. (meta)data are released with a clear and accessible data
usage license;
R1.2. (meta)data are associated with detailed provenance;
R1.3. (meta)data meet domain-relevant community
standards;
https://www.nature.com/articles/sdata201618
7. FAIR DATA PRINCIPLES - METADATA
Findable:
F1. metadata are assigned a globally unique and persistent
identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier of the
data it describes;
F4. metadata are registered or indexed in a searchable
resource;
Accessible:
A1. metadata are retrievable by their identifier using a
standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no longer
available;
Interoperable:
I1. metadata use a formal, accessible, shared, and broadly
applicable language for knowledge representation.
I2. metadata use vocabularies that follow FAIR principles;
I3. metadata include qualified references to other metadata;
Reusable:
R1. metadata are richly described with a plurality of accurate and
relevant attributes;
R1.1. metadata are released with a clear and accessible data
usage license;
R1.2. metadata are associated with detailed provenance;
R1.3. metadata meet domain-relevant community standards;
https://www.nature.com/articles/sdata201618
8. FAIR DATA PRINCIPLES – DATA/DIGITAL RESOURCES
Findable:
F1. data are assigned a globally unique and persistent
identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier of the
data it describes;
F4. metadata are registered or indexed in a searchable
resource;
Accessible:
A1. metadata are retrievable by their identifier using a
standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no longer
available;
Interoperable:
I1. metadata use a formal, accessible, shared, and broadly
applicable language for knowledge representation.
I2. metadata use vocabularies that follow FAIR principles;
I3. metadata include qualified references to other (meta)data;
Reusable:
R1. metadata are richly described with a plurality of accurate and
relevant attributes;
R1.1. metadata are released with a clear and accessible data
usage license;
R1.2. metadata are associated with detailed provenance;
R1.3. metadata meet domain-relevant community standards;
https://www.nature.com/articles/sdata201618
9. FAIR DATA PRINCIPLES – SUPPORT INFRASTRUCTURE
Findable:
F1. (meta)data are assigned a globally unique and
persistent identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier
of the data it describes;
F4. (meta)data are registered or indexed in a searchable
resource;
Accessible:
A1. (meta)data are retrievable by their identifier using a
standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no
longer available;
Interoperable:
I1. (meta)data use a formal, accessible, shared, and
broadly applicable language for knowledge
representation.
I2. (meta)data use vocabularies that follow FAIR
principles;
I3. (meta)data include qualified references to other
(meta)data;
Reusable:
R1. (meta)data are richly described with a plurality of
accurate and relevant attributes;
R1.1. (meta)data are released with a clear and
accessible data usage license;
R1.2. (meta)data are associated with detailed
provenance;
R1.3. (meta)data meet domain-relevant community
standards;
https://www.nature.com/articles/sdata201618
11. DATA EXPERT EFFORT
Source: Data Science Report 2016, CrowdFlower, 2016: http://visit.crowdflower.com/rs/416-ZBE-142/images/CrowdFlower_DataScienceReport_2016.pdf
12. DATA EXPERT EFFORT
Source: Data Science Report 2016, CrowdFlower, 2016: http://visit.crowdflower.com/rs/416-ZBE-142/images/CrowdFlower_DataScienceReport_2016.pdf
13. BREAK DOWN DATA EXPERT EFFORT PRINCIPLES
Findable:
F1. (meta)data are assigned a globally unique and persistent
identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier of the
data it describes;
F4. (meta)data are registered or indexed in a searchable
resource;
Accessible:
A1. (meta)data are retrievable by their identifier using a
standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no longer
available;
Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly
applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles;
I3. (meta)data include qualified references to other
(meta)data;
Reusable:
R1. (meta)data are richly described with a plurality of accurate and
relevant attributes;
R1.1. (meta)data are released with a clear and accessible data
usage license;
R1.2. (meta)data are associated with detailed provenance;
R1.3. (meta)data meet domain-relevant community
standards;
19% of the time
60% of the time
14. COST OF NOT HAVING FAIR DATA
SOURCE: Study on the cost of not having FAIR research data
https://publications.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1
15. SOURCE: Study on the cost of not having FAIR research data
https://publications.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1
COST OF NOT HAVING FAIR DATA
€ 10,2B
with an expected economic benefit of up to € 20,1 by 2020
35. DATA STEWARDSHIP WIZARD – HTTPS://DS-WIZARD.ORG
Dynamic hierarchical questionnaire
Customisable DS Knowledge Model
Integration with the Data Stewardship for Open Science book
(FAIR) metrics evaluation
Desirability of answers
Integration with other resources (FAIRSharing, Bio.tools, …) –
upcoming
Machine actionable DMPs - part of http://activedmps.org/
36. The FAIR Funder Components: https://ds-wizard.org
DATA STEWARDSHIP WIZARD
37. The FAIR Funder Components: https://ds-wizard.org
DATA STEWARDSHIP WIZARD
38. REPOSITORIES SUPPORTING USERS TO ACHIEVE FAIR
Findable:
F1. (meta)data are assigned a globally unique and persistent
identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier of the
data it describes;
F4. (meta)data are registered or indexed in a searchable
resource;
Accessible:
A1. (meta)data are retrievable by their identifier using a
standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no longer
available;
Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly
applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles;
I3. (meta)data include qualified references to other
(meta)data;
Reusable:
R1. (meta)data are richly described with a plurality of accurate and
relevant attributes;
R1.1. (meta)data are released with a clear and accessible data
usage license;
R1.2. (meta)data are associated with detailed provenance;
R1.3. (meta)data meet domain-relevant community
standards;
39. • A registry of inter-linked (meta)data standards, repositories,
knowledgebases and policies
• A set of tools and services to discover and use these resources,
also connected to FAIRness assessment and evaluation tools
A flagship output of the: Core part of implementation networks in:
ERC EOSC Science Europe
Recommended by funders, e.g.:
40. • We guide consumers to discover, select and use standards,
repositories, knowledgebases and policies with confidence
• And producers to make their resources more visible, widely
adopted and cited
A flagship output of the: Core part of implementation networks in:
ERC EOSC Science Europe
Recommended by funders, e.g.:
41. Researchers in academia,
industry, government
Developers and curators of
resources
Journal publishers with data
policy
Research data facilitators,
librarians, trainers
Learned societies, unions
and associations
Funders and data policy
makers
https://doi.org/10.1038/s41587-019-0080-8
Open Access CC-BY
69 authors and users,
representing different stakeholder groups:
43. Examining FAIR
“Maturity”
Mark D Wilkinson
CBGP UPM-INIA, Madrid,
Spain
https://tinyurl.com/MeasuringFAIR
presenting the work of:
Mark Wilkinson, Erik Schultes, Susanna
Assunta Sansone, Ricardo de Miranda
Azevedo, Luiz Olavo Bonino da Silva Santos,
Philippe Rocca-Serra, Peter McQuilton,
Dominique Batista, Michel Dumontier
44. PREMISE
FAIR is more than a “fuzzy feeling”
“This necessitates machines to be capable of autonomously and
appropriately acting when faced with the wide range of types, formats,
and access-mechanisms/protocols that will be encountered during their
self-guided exploration of the global data ecosystem.”
https://www.nature.com/articles/sdata201618
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
45. “FAIRness”
is
Measurable
...almost by
definition!
Creating a Web of data that can be
“intelligently” (re)used by Machines
necessitates certain behaviours by
data providers
These behaviours can be concretely described
Software can then be written that can utilize
these behaviours to find, access, and correctly
reuse the data
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
46. “FAIRness”
is
Measurable
...almost by
definition!
If that is true…
Software can also be written that examines a
given digital entity to see if it exhibits those
behaviours.
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
“FAIRness”
is
Measurable
...almost by
definition!
47. The
Founding
FAIR Metrics
Authoring
Team
...yes,
we were
self-
selecting!
Erik Schultes
Biomedical Researcher & Science Coordinator @ GFISCO
Luiz Olavo Bonino da Silva Santos
Semantic Interoperability & Technology Coordinator @ GFISCO
Mark Wilkinson, FBBVA Industry Chair, Biotechnology
Senior Researcher, Biological Informatics (--> Genetics)
Michel Dumontier
Distinguished Professor of Data Science (--> Biochemistry)
Susanna Assunta Sansone
Associate Professor, Engineering Science, U Oxford
Hon. Academic Editor, Scientific Data Journal
Peter Doorn
Director, Data Archiving and Networked Services, NL
Source:MarkD.Wilkinson-https://tinyurl.com/MeasuringFAIR
49. Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
GEN 1 FAIR METRICS
After 3 face-to-face meetings
And over 6 months
We successfully created…
50. Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
GEN 1 FAIR METRICS
After 3 face-to-face meetings
And over 6 months
We successfully created…
3 Metrics
(and we weren’t even entirely happy with those!)
53. Publication:
A Design Framework
for “Metrics”
June 2018
https://www.nature.com/articles/
sdata2018118
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
54. TYPICAL “GENERATION 1” METRIC
Example: FM-F1B, Identifier Persistence
Provide: URL to a document describing your persistence policy
Test: Does the URL resolve?
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
55. TYPICAL “GENERATION 1” METRIC
Example: FM-F1B, Identifier Persistence
Provide: URL to a document describing your persistence policy
Test: Does the URL resolve?
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
56. SEMI-AUTOMATED TEST VIA QUESTIONNAIRE PLUS “VALIDATION”
These kinds of Metrics lend themselves well to questionnaire-style interfaces
They raise awareness of what issues should be considered when creating FAIR resources
They can be (minimally) validated
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
57. (MANY) SUCH INTERFACES HAVE BEEN CREATED
https://purplepolarbear.nl
(this one uses our Generation 1 Metrics,
but a wide range of other questionnaires
have been created by other communities)
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
58. QUESTIONNAIRE INTERFACE (ONE EXAMPLE OF MANY)
https://purplepolarbear.nl
One advantage of
questionnaire-based
interfaces is that they can
be used to inform
(this text taken from “FAIR
Principles Explained” at GO
FAIR)
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
59. FAILURES OF OUR INITIAL APPROACH
It was subjective - the most rigorous evaluations were manually executed, examining
the answers to the questionnaire based on “our opinion”
The resulting scores were a range between 0 and 1 (represented by 0 to 5 ‘stars’),
but the selection of a score was also subjective.
There was no way to interpret a partial score - what is the difference between a
score of .25 and a score of .75?
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
61. PREMISE
FAIR is more than a “fuzzy feeling”
“This necessitates machines to be capable of autonomously and
appropriately acting when faced with the wide range of types, formats,
and access-mechanisms/protocols that will be encountered during their self-
guided exploration of the global data ecosystem.”
https://www.nature.com/articles/sdata201618
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
62. GENERATION 2 FAIR MATURITY INDICATORS
What do machines see?
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
63. OBJECTIVE: OBJECTIVITY IN FAIR EVALUATION
Organizations want to KNOW if they are FAIR
Organizations want ADVICE on what they can do better
What is required for enhanced FAIR behaviours
How difficult is it?
How much will it cost?
What expertise do I need?
Objective evaluations (with narrative feedback on failures) provide these answers!
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
64. NON-TECHNICAL CHANGES IN GENERATION 2
Change in name: “Maturity Indicators” sounds less judgemental than “Metrics”
More emphasis on community-driven FAIR Evaluation
Better documentation for how to contribute/participate
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
65. TECHNICAL FAILURES IN GENERATION 1 QUESTIONNAIRES
1. They don’t scale to the entire world!
2. They are time-consuming
3. They cannot be executed by “anyone” (only by the person who knows the resource
deeply)
4. They are (potentially) biased
5. They don’t adequately test one of the main point of FAIR, because a human is not
capable of evaluating this:
Can a machine find
and (re)use the data?
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
66. FIXING THE PROBLEMS
Generation 2 Maturity Indicators (MIs)
Approach to building Gen2 MIs:
In-depth review of answers to original questionnaire (~11 resources)
Record what providers are doing in-practice
Note complaints from key data repositories v.v. “unfair” evaluations
Build a “metadata harvesting” library that attempts to be
“exhaustive”
That is, it pursues paths that I (personally) don’t consider to be “FAIR in-spirit”
Trying not to be “prescriptive”!
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
67. Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
METADATA HARVESTER
68. GEN2 MATURITY INDICATOR TESTS
Are “standalone” Web interfaces that can be written/published by anyone
Consume ONLY the GUID of the metadata
they invoke The Harvester with that
Their interface metadata is recorded using
They are registered in the smartAPI registry for discovery (not required)
They are registered in the Evaluator to be used by others (not required)
They test the “block of metadata” for various features
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
69. GEN2 MATURITY INDICATOR TESTS
For example, the Generation 2 Maturity Indicator “data identifier in metadata” looks
for a hash key, or a LD property, from a list of widely-used properties that point at data,
including:
foaf:primaryTopic, dcat:distribution, ldp:contains, schema:mainEntity….
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
70. DOCS FOR GEN2_MI_F3 “DATA IDENTIFIER EXPLICITLY IN METADATA”
To locate the data identifier, hash data is tested for the keys:
● codeRepository
● mainEntity
● primaryTopic
● IAO:0000136 (is about)
● IAO_0000136
● SIO:000332 (is about)
● SIO_000332
● distribution
● contains
Graph data is tested for the properties:
● schema:codeRepository
● schema:mainEntity
● foaf:primaryTopic
● IAO:0000136 (information artifact ontology 'is about')
● SIO:000332 (SemanticScience Integrated Ontology 'is about')
● schema:distribution
● DCAT:distribution (Data Catalogue vocabulary)
● ldp:contains (Linked Data Platform)
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
71. RESULTS
Pass/Fail
● Gen2 Maturity Indicators return binary pass/fail
● They attempt to log everything they do, so that the output contains a record of why the test
passed/failed
○ The POINT of the test is to encourage incremental improvements of FAIRness
○ Detailed feedback is important not only for transparency, but to be informative and
supportive
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
72. Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
73. Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
74. DESIGNED FOR BOTTOM-UP COMMUNITY-DRIVEN EVALUATION
Anyone can create a new Maturity Indicator, and submit it for open discussion
Anyone can suggest edits to existing Maturity Indicators, if we’re not “fair”
https://github.com/FAIRMetrics/Metrics/blob/master/MaturityIndicators/README.md
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
75. (CURRENT) WORKFLOW FOR REGISTERING A NEW MI
Community members write a human-readable Maturity Indicator following a
MarkDown template
Pull request on GitHub
The proposed Indicator is “scraped” by FAIRSharing, and registered as “under
consideration”
A (yet to be defined) process, including discussion and advice from FAIR evaluation
experts, will ensue
The Maturity Indicator will be approved
FAIRSharing will update their record to show that this is an approved Maturity
Indicator
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
76. NEW INTERFACE
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
http://tinyurl.com/FAIR-Maturity-Tester
77. SUMMARY
The point of objective, community-driven evaluations is to empower community
governance organizations and agencies
THEY choose what is evaluated,
based on what
THEY care about!
THEY may also choose to check their resources against a more “core/global” standard
set of tests
Source: Mark D. Wilkinson - https://tinyurl.com/MeasuringFAIR
78. THANK YOU!!!
Luiz Bonino
International Technology Coordinator – GO FAIR
Associate Professor BioSemantics – LUMC
E-mail: luiz.bonino@go-fair.org
Skype: luizolavobonino
Web: www.go-fair.org