FAIR:
From Principles to Practices
Susanna-Assunta Sansone
ORCiD: 0000-0001-5306-5690 | Twitter: @SusannaASansone
UKRN - From Data to Metadata: Ensuring reproducibility in biomedical research, 22 Oct 2020
Slides: https://www.slideshare.net/SusannaSansone
datareadiness.eng.ox.ac.uk
Associate Professor, Engineering Science
Associate Director, Oxford e-Research Centre
Datasets SOPs Figures Workflows Slides Codes Tools DatabasesAlgorithmsDocument
This requires data that are
• Discoverable by humans and machines
• Retrievable and structured in standard format(s)
• Self-described so that third parties can make sense of it
Discoveries are also made using shared data
This requires data that are
• Discoverable by humans and machines
• Retrievable and structured in standard format(s)
• Self-described so that third parties can make sense of it
Forbes article on 2016 Data Scientist Report
https://www.forbes.com/sites/gilpress/2016/03/23/data-
preparation-most-time-consuming-least-enjoyable-data-
science-task-survey-says/#276a35e6f637
Data preparation accounts for about 80% of the work of data scientists
Discoveries are also made using shared data
A set of principles to enhance
the value of all digital
resources and its reuse by
humans and machines
Data that is discoverable and usable at scale
Findable
Accessible
Interoperable
Reusable
• Globally unique, resolvable, and persistent identifiers
▪ To retrieve and connect data
• Community defined descriptive metadata
▪ To enhance discoverability
• Common terminologies
▪ To use the same term mean the same thing
• Detailed provenance
▪ To contextualize the data and facilitate reproducibility
• Terms of access
▪ Open as possible, closed as necessary
• Terms of use
▪ Clear licences, ideally to enable innovation and reuse
The FAIR Principles in a nutshell
doi.org/10.2777/1524www.gov.uk/government/publications/open-
research-data-task-force-final-report
doi.org/10.5281/zenodo.1245568www.turing.ac.uk/research/impact-
stories/changing-culture-data-science
FAIR has de facto become a global norm
www.fair-access.net.au
doi.org/10.1787/25186167
doi.org/10.2777/02999
Impact on innovation
Total of: €27bn/yr in Europe
The cost of not having FAIR research data
The scholarly publishing
ecosystem is changing
Data-relates mandates by funders
and institutions are growing
Researchers need
recognition and credit
theconversation.com/how-robots-can-help-us-embrace-a-more-human-view-of-disability-76815
Human-machine collaboration is the future
o 21% pharmacology data (doi.org/10.1038/nrd3439-c1)
o 11% cancer data (doi.org/10.1038/483531a)
o unsatisfactory in ML (openreview.net/pdf?id=By4l2PbQ-)
towardsdatascience.com/scientific-data-analysis-pipelines-and-reproducibility-75ff9df5b4c5
Reproducibility of published studies is still problematic
Responding to needs and crisis
Findable
Accessible
Interoperable
Reusable
Is NOT a standard but a set of guiding
principles that provide for a continuum
of increasing reusability, via many
different implementations
The FAIR Principles are aspirational
Define Implement Embed & Sustain
Concepts for FAIR
implementation
FAIR
culture
FAIR
ecosystem
Skills for
FAIR
Incentives and
metrics for FAIR
data and services
Investment in
FAIR
Economic Technical Social Political
doi.org/10.2777/1524
Making FAIR a reality in the research ecosystem
Depends upon several stakeholders in the research
ecosystem actively playing their parts to:
• deliver research infrastructures and tools
• harmonize the standards
• address policies, education and training
• overcome technical, social and cultural challenges
• identify motivators, credit and rewards mechanisms
Making FAIR a reality in the research ecosystem
Diversity of methods:
• Metrics and indicators
• Automated and manual
Examples:
A crowded space: FAIR evaluation tools
Credit to:
A crowded space: European projects
Credit to:
A crowded space: European projects
Two pillars of FAIR
20%
identifiers
80%
metadata
https://doi.org/10.2777/1524
Better metadata for better data
“Most metadata field names and their
values are not standardized or
controlled”
“Even simple binary or numeric fields are
often populated with inadequate values
of different data types”
Formats Terminologies Guidelines Identifiers
ID
Conceptual model,
conceptual schema,
exchange formats
to allow data to flow
from one system
to another
Controlled vocabularies,
thesauri, ontologies
to use the same word
and refer to the
same ‘thing’
Minimum information
reporting requirements,
or checklists
to report the same core,
essential information
Unambiguous,
persistent and context-
independent schema
to identify data
and metadata
elements
Enable computer systems or software to understand information
with sufficient accuracy to utilise (e.g., exchange, integrate)
that information for an intelligent purpose
Better metadata for better data
440+
170+
790+
~1300
18
MIAME
MIRIAM
MIQASMIX
MIGEN
ARRIVE
MIAPE
MIASE
MIQE
MISFISHIE
….
REMARK
CONSORT
SRAxml
SOFT FASTA
DICOM
MzML
SBRML
SEDML
…
GELML
ISA
CML
MITAB
…
AAO
CHEBIOBI
PATO ENVO
MOD
BTO
IDO
…
TEDDY
PRO
XAO
DO
VO
EC number
URL
PURLLSID
HandleORCID
RRID
InChI
…
IVOA ID
DOI
Standard
Organization e.g.:
Grass-roots
Groups, e.g.:
Formats Terminologies Guidelines Identifiers
ID
COMMUNITY STANDARDS
for metadata and identifiers
Community standards
DATA & METADATA STANDARDS
REPOSITORIES
databases and
knowledgebases
DATA POLICIES
by funders, journals and
other organizations
We need tools and services and foster a culture change
into one where the use of standards and repositories
is pervasive and seamless
Turning FAIR into reality
DATA & METADATA STANDARDS
REPOSITORIES
databases and
knowledgebases
DATA POLICIES
by funders, journals and
other organizations
Provides curated, community-vetted
descriptions and knowledge graphs
representing these resources and their interlinks
an informative and educational resource
A flagship output of and a WG in:
Recommended by funders, e.g.:
community
fairsharing.org/communities
Guides consumers to discover, select and use these
resources with confidence
Helps producers to make their resources more visible,
more widely adopted and cited
Example of use by research community
Groups and organizations in all disciplines have created
Collections, which are tailored views on selected standards
and/or repositories relevant to their needs or developed by them
A growing number of data management plan, data stewardship
and FAIR assessments tools access FAIRsharing (via its API)
for a look-up and select service for standards and repositories
Example of use by third party tools
ds-wizard.org
dmponline.dcc.ac.uk
w3id.org/AmIFAIR
fairshake.cloud
Including:
Helping publishers to have clearer data policies
Publishers use FAIRsharing to:
• select standards and repositories they
recommend in the data policy;
• discover new resources and monitor their
evolution to refine or update their policies
Standards and repositories are the pillars
of data sharing and FAIR data
“The interactive browser will allow us to discover which databases and standards
are not currently included in our author guidelines, enabling us to regularly
monitor and refine our policies as appropriate, in support of our mission to help
our authors enhance the reproducibility of their work.”
H. Murray. Publishing Editor, F1000Research
https://doi.org/10.1038/s41587-019-0080-8
Open Access CC-BY
69 authors (adopters, collaborators, users)
representing different stakeholder groups
We analyse the data policies by journals/publishers,
and the standards and repositories they recommend
We conclude that the discrepancy in the
recommendations can be overcome working
collaboratively under FAIRsharing
Promoting policy alignment among publishers
FAIRsharing works with DataCite and major
publishers to align their data policies in
relation to repositories and standards, which
can be recommended to authors to improve
data sharing practices
Data Repository Selection: Criteria That Matter
Pre-print:
https://doi.org/10.5281/zenodo.4084763
Promoting policy alignment among publishers
Before FAIR
The road to data management and sharing
The road to data management and sharing
Before FAIR
After FAIR
The road to data management and sharing
Before FAIR
After FAIR
….from chaos,
comes order?
http://blogs.nature.com/scientificdata/2019/10/22/the-layered-cake
A FAIRy tale needs some magic
infrastructures
standards
tools
policies
education
training
cultural normalization
incentives
long term investment
It is not simple, but it is no longer optional
Magic Roundabout, Swindon, UK
The FAIR Principles and FAIRsharing

The FAIR Principles and FAIRsharing

  • 1.
    FAIR: From Principles toPractices Susanna-Assunta Sansone ORCiD: 0000-0001-5306-5690 | Twitter: @SusannaASansone UKRN - From Data to Metadata: Ensuring reproducibility in biomedical research, 22 Oct 2020 Slides: https://www.slideshare.net/SusannaSansone datareadiness.eng.ox.ac.uk Associate Professor, Engineering Science Associate Director, Oxford e-Research Centre
  • 2.
    Datasets SOPs FiguresWorkflows Slides Codes Tools DatabasesAlgorithmsDocument This requires data that are • Discoverable by humans and machines • Retrievable and structured in standard format(s) • Self-described so that third parties can make sense of it Discoveries are also made using shared data
  • 3.
    This requires datathat are • Discoverable by humans and machines • Retrievable and structured in standard format(s) • Self-described so that third parties can make sense of it Forbes article on 2016 Data Scientist Report https://www.forbes.com/sites/gilpress/2016/03/23/data- preparation-most-time-consuming-least-enjoyable-data- science-task-survey-says/#276a35e6f637 Data preparation accounts for about 80% of the work of data scientists Discoveries are also made using shared data
  • 4.
    A set ofprinciples to enhance the value of all digital resources and its reuse by humans and machines Data that is discoverable and usable at scale
  • 5.
    Findable Accessible Interoperable Reusable • Globally unique,resolvable, and persistent identifiers ▪ To retrieve and connect data • Community defined descriptive metadata ▪ To enhance discoverability • Common terminologies ▪ To use the same term mean the same thing • Detailed provenance ▪ To contextualize the data and facilitate reproducibility • Terms of access ▪ Open as possible, closed as necessary • Terms of use ▪ Clear licences, ideally to enable innovation and reuse The FAIR Principles in a nutshell
  • 6.
  • 7.
    doi.org/10.2777/02999 Impact on innovation Totalof: €27bn/yr in Europe The cost of not having FAIR research data
  • 8.
    The scholarly publishing ecosystemis changing Data-relates mandates by funders and institutions are growing Researchers need recognition and credit theconversation.com/how-robots-can-help-us-embrace-a-more-human-view-of-disability-76815 Human-machine collaboration is the future o 21% pharmacology data (doi.org/10.1038/nrd3439-c1) o 11% cancer data (doi.org/10.1038/483531a) o unsatisfactory in ML (openreview.net/pdf?id=By4l2PbQ-) towardsdatascience.com/scientific-data-analysis-pipelines-and-reproducibility-75ff9df5b4c5 Reproducibility of published studies is still problematic Responding to needs and crisis
  • 9.
    Findable Accessible Interoperable Reusable Is NOT astandard but a set of guiding principles that provide for a continuum of increasing reusability, via many different implementations The FAIR Principles are aspirational
  • 10.
    Define Implement Embed& Sustain Concepts for FAIR implementation FAIR culture FAIR ecosystem Skills for FAIR Incentives and metrics for FAIR data and services Investment in FAIR Economic Technical Social Political doi.org/10.2777/1524 Making FAIR a reality in the research ecosystem
  • 11.
    Depends upon severalstakeholders in the research ecosystem actively playing their parts to: • deliver research infrastructures and tools • harmonize the standards • address policies, education and training • overcome technical, social and cultural challenges • identify motivators, credit and rewards mechanisms Making FAIR a reality in the research ecosystem
  • 12.
    Diversity of methods: •Metrics and indicators • Automated and manual Examples: A crowded space: FAIR evaluation tools
  • 13.
    Credit to: A crowdedspace: European projects
  • 14.
    Credit to: A crowdedspace: European projects
  • 15.
    Two pillars ofFAIR 20% identifiers 80% metadata https://doi.org/10.2777/1524
  • 16.
    Better metadata forbetter data “Most metadata field names and their values are not standardized or controlled” “Even simple binary or numeric fields are often populated with inadequate values of different data types”
  • 17.
    Formats Terminologies GuidelinesIdentifiers ID Conceptual model, conceptual schema, exchange formats to allow data to flow from one system to another Controlled vocabularies, thesauri, ontologies to use the same word and refer to the same ‘thing’ Minimum information reporting requirements, or checklists to report the same core, essential information Unambiguous, persistent and context- independent schema to identify data and metadata elements Enable computer systems or software to understand information with sufficient accuracy to utilise (e.g., exchange, integrate) that information for an intelligent purpose Better metadata for better data
  • 18.
    440+ 170+ 790+ ~1300 18 MIAME MIRIAM MIQASMIX MIGEN ARRIVE MIAPE MIASE MIQE MISFISHIE …. REMARK CONSORT SRAxml SOFT FASTA DICOM MzML SBRML SEDML … GELML ISA CML MITAB … AAO CHEBIOBI PATO ENVO MOD BTO IDO … TEDDY PRO XAO DO VO ECnumber URL PURLLSID HandleORCID RRID InChI … IVOA ID DOI Standard Organization e.g.: Grass-roots Groups, e.g.: Formats Terminologies Guidelines Identifiers ID COMMUNITY STANDARDS for metadata and identifiers Community standards
  • 19.
    DATA & METADATASTANDARDS REPOSITORIES databases and knowledgebases DATA POLICIES by funders, journals and other organizations We need tools and services and foster a culture change into one where the use of standards and repositories is pervasive and seamless Turning FAIR into reality
  • 20.
    DATA & METADATASTANDARDS REPOSITORIES databases and knowledgebases DATA POLICIES by funders, journals and other organizations Provides curated, community-vetted descriptions and knowledge graphs representing these resources and their interlinks an informative and educational resource
  • 21.
    A flagship outputof and a WG in: Recommended by funders, e.g.: community fairsharing.org/communities
  • 22.
    Guides consumers todiscover, select and use these resources with confidence Helps producers to make their resources more visible, more widely adopted and cited
  • 23.
    Example of useby research community Groups and organizations in all disciplines have created Collections, which are tailored views on selected standards and/or repositories relevant to their needs or developed by them
  • 24.
    A growing numberof data management plan, data stewardship and FAIR assessments tools access FAIRsharing (via its API) for a look-up and select service for standards and repositories Example of use by third party tools ds-wizard.org dmponline.dcc.ac.uk w3id.org/AmIFAIR fairshake.cloud Including:
  • 25.
    Helping publishers tohave clearer data policies Publishers use FAIRsharing to: • select standards and repositories they recommend in the data policy; • discover new resources and monitor their evolution to refine or update their policies Standards and repositories are the pillars of data sharing and FAIR data
  • 26.
    “The interactive browserwill allow us to discover which databases and standards are not currently included in our author guidelines, enabling us to regularly monitor and refine our policies as appropriate, in support of our mission to help our authors enhance the reproducibility of their work.” H. Murray. Publishing Editor, F1000Research
  • 27.
    https://doi.org/10.1038/s41587-019-0080-8 Open Access CC-BY 69authors (adopters, collaborators, users) representing different stakeholder groups We analyse the data policies by journals/publishers, and the standards and repositories they recommend We conclude that the discrepancy in the recommendations can be overcome working collaboratively under FAIRsharing Promoting policy alignment among publishers
  • 28.
    FAIRsharing works withDataCite and major publishers to align their data policies in relation to repositories and standards, which can be recommended to authors to improve data sharing practices Data Repository Selection: Criteria That Matter Pre-print: https://doi.org/10.5281/zenodo.4084763 Promoting policy alignment among publishers
  • 29.
    Before FAIR The roadto data management and sharing
  • 30.
    The road todata management and sharing Before FAIR After FAIR
  • 31.
    The road todata management and sharing Before FAIR After FAIR ….from chaos, comes order? http://blogs.nature.com/scientificdata/2019/10/22/the-layered-cake
  • 32.
    A FAIRy taleneeds some magic infrastructures standards tools policies education training cultural normalization incentives long term investment It is not simple, but it is no longer optional Magic Roundabout, Swindon, UK