Overview of FAIR, FAIRsharing and the FAIR Cookbook at the ATI event on Knowledge Graphs: https://github.com/turing-knowledge-graphs/meet-ups/blob/main/symposium-2022.md
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
FAIR, community standards and data FAIRification: components and recipes
1. FAIR, community standards and data FAIRification:
components and recipes
1st Annual Symposium of the Alan Turing Institute’s Knowledge Graphs IG, 17 June, 2022
Slides: https://www.slideshare.net/SusannaSansone
Group:
datareadiness.eng.ox.ac.uk
ORCiD: 0000-0001-5306-5690
Twitter: @SusannaASansone
Susanna-Assunta Sansone
Professor of Data Readiness;
Associate Director, Oxford e-Research Centre
Philippe Rocca-Serra
Associate Member of Faculty;
Group Coordinator, Oxford e-Research Centre
ORCiD: 0000-0001-9853-5668
Twitter: @Phil_at_OeRC
3. Discoveries are made using shared data and this requires data that are:
• Cited and stored to be discoverable
• Retrievable and structured in standard format(s)
• Richly described to be understandable
Rationale behind the FAIR Principles
https://www.forbes.com/sites/gilpress/2016/03/23/data-
preparation-most-time-consuming-least-enjoyable-data-science-
task-survey-says/#276a35e6f637
Data preparation accounts for about 80% of the work of data scientists
4. Use of data at scale by humans and machines
Globally unique and
persistent identifiers
Community defined
descriptive metadata
Community defined
terminologies
Detailed
provenance
Terms of access
Terms of
use
7. Making FAIR a reality in the research ecosystem
doi.org/10.2777/1524
8. An intergovernmental organisation that brings
together life science resources
from across Europe, to coordinate them so that
they form a single infrastructure
9. ELIXIR - a sustainable infrastructure for
biological data
Food & Nutrition
+ Toxicology
10. #ELIXIR22
Some examples:
Projects and Communities, incl.: Global
initiatives,e.g
NEW: RDA Life Science Infrastructure IG with Australia BioCommons,
the US NIH Office of Data Science Strategy, and H3ABioNet in Africa.
IMI2 project guidelines for
open access to publications
and research data
Funders’
guidelines
ELIXIR Interoperability Platform:
FAIR Service Framework
11. IMI2 project guidelines for
open access to publications
and research data
Recommended by
European funders
ELIXIR FAIR Service Framework:
focus on two resources
(All disciplines)
(Life Science focused)
12.
13. An informative and educational resource, and a service
FAIRsharing provides curated descriptions and relationship graphs of
standards, databases and policies in all disciplines
COMMUNITY STANDARDS
POLICIES
by funders, journals
and other organizations
DATABASES
including repositories
and knowledgebases
Identifiers
Terminologies Guidelines
Formats
14. Identifiers
Terminologies Guidelines
Formats
Conceptual model, conceptual
schema, exchange formats
to represent, contain and
move information
Controlled vocabularies,
thesauri, ontologies
to disambiguate terms
and enable semantic
relationships
Minimum information
reporting requirements,
or checklists
to report the same core,
essential information
Unambiguous, persistent and
context-independent schema
to identify data
and metadata elements
Standards to report metadata and data:
the pillars of FAIR
17. Standard organizations, e.g.: Grass-roots groups, e.g.:
• Industry-level standards
• Mostly regulators-driven
• Participation is often regulated
• Standards are sold or licenced
• Formal development process, often
less flexible, could be lengthy
• Charges apply to advanced training
or programmatic access
• Mostly research-level standards
• Open to any interested party
• Volunteering efforts
• Standards are free for use
• Development process varies, more
flexible and adaptable to changes
• Minimal or little funds for carry out the
work, let alone provide training
Understanding their life cycle and landscape
Identifiers
Terminologies Guidelines
Formats
18. Standards: known pain points
DOI: 10.6084/m9.figshare.4055496.v1
Technical and social challenges, including:
• Fragmentation, harmonization and extensions
• Governance and ownership
• Indicators and evaluation methods
• Implementations, tools and services
• Synergies between basic and clinical/medical areas
• Credit and incentives for contributors
• Education, documentation and training
• Funding streams to support the ‘standard life cycle’
• Business models for sustainability
19.
20. Guides consumers to discover, select and use these resources with confidence
Helps producers to make their resources more visible, more widely adopted and cited
Promoting the value of standards, their use in
repositories and adoption by policies
Total of
over 3639 resources
(June 2022)
repositories
standards
policies
23. Translational Medicine
Clinical Developments
URL: https://fairsharing.org/3519
(work in progress!)
A collaboration with their FAIR Implementation WG
Disclaimer: These profiles speak for a limited community and do not represent any company standards
Building and comparing
“FAIR profiles”
24. Clinical Developments
Disclaimer: These profiles speak for a limited community and do not represent any company standards
Snapshot of the semantic and
syntactic standards used
25. Findability
Sitemap.xml, JSON
Markup with Schema.org for
search indexes
DOI unique persistent
identifiers for each record
ORCID for author credit and
authentication
Accessibility
read/write REST API
read OAI-PMH
Interoperability
JSON markup
Standardized semantics
Cross-links to or import from
records in other registries
ROR for organizations (ongoing)
FundRef for funders (ongoing)
Reusability
CC BY 4.0 license
JSON export
The FAIRness of the FAIRsharing
27. Users, adopters and collaborators include:
https://fairsharing.org/communities
An endorsed output of the
FAIRsharing WG (since 2015):
A WG (since 2015) in:
A recommended resource in EOSC reports
Users from all stakeholder groups
Researchers Developers and curators Journal publishers
Societies and Alliances
Librarians and Trainers Funders
FAIRsharing: working with and for all stakeholders
29. A 10,000 foot view of the EOSC Research Graph
“Open, participatory research graph where products of the
research life-cycle (e.g. scientific literature, research data,
project, software) are semantically linked to each other and carry
information about their access rights (i.e. if they are Open
Access, Restricted, Embargoed, or Closed) and the sources
from which they have been collected and where they are hosted”
URL: graph.openaire.eu
30. ELIXIR wants to make sure our data resources are
visible in this EOSC Research Graph
● Ensuring that ELIXIR resources and services, starting from databases,
knowledge bases and repositories are more discoverable in EOSC
○ Via the EOSC Research Graph by OpenAIRE
● Note that currently our understanding of this graph is limited
○ Still unclear of what the specific use cases are, how granular it will be etc.
This ELIXIR work is by:
Allyson Lister, FAIRsharing
Alasdair Gray, Bioschemas
and contributors
32. FAIRsharing and Bioschemas as
information providers
● Provides rich descriptions of the databases, incl. content
types, access conditions, mantainers (ORCiD),
organizations (ROR) etc., also interlinked with standards
● And we already have descriptions of many/most ELIXIR
and EOSC-Life databases and standards
● Provides key metadata for discoverability of dataset content
● Many ELIXIR databases’ pages are already marked up
33. Prototyping the process - unfunded
Mapping
and
harvesting
We are mapping Bioschemas to
the Datacite schema (Enrico
Ottonello, Andreas Czerniak, Nick Juty,
Alasdair J. G. Gray)
We are mapping FAIRsharing model
and databases IDs to the openAIRE
model (Ramon Granell, Alessia Bardi,
Delphine Dauga, Allyson Lister)
OpenAIRE retrieves general
info from FAIRsharing and
follows the link to the
sitemap where it harvest the
Bioschemas mark-up
1. They register (or claim) their database, adding
(or vetting) additional descriptors, including:
- mantainers, as individuals and organizations
- publications
- data access conditions
- standards implemented
2. They specify the Bioschemas access points
1. ELIXIR members markup their database’s
pages, including links to:
- containing dataset
- publications
- equivalent resources in other sites
2. Create a sitemap
41. These are profiles of the
organizations and their RIs, with
their data resources and standards
URL: fairsharing.org/graph/3513
FAIRsharing - building maps of resources
42. Collection URL: fairsharing.org/graph/3515;
each record has a DOI
Collection URL: fairsharing.org/graph/3513;
each record has a DOI
FAIRsharing - working with communities to create
subject-specific collection of resources
43. Stakeholder Advisors
● Amye Kenall, VP of Publishing and Product, Research Square
● Adam Leary, Oxford University Press
● Catriona MacCallum, Hindawi
● Dagmar Meyer, European Research Council, Executive Agency
● Dominic Fripp, JISC, UK
● Emma Ganley, Protocols.io
● Geraldine Clement-Stoneham, Medical Research Council
● Helena Cousijn, DataCite
● Iain Hrynaszkiewicz, PLoS
● Imma Subirats, FAO of the United Nations
● Kiera McNiece, Cambridge University Press
● Luiz Olavo Bonino, GO-FAIR
● Marina Soares E Silva and Sarah Callaghan, Elsevier
● Michael Ball, Biotechnology and Biological Sciences Research Council
● Mike Huerta, NIH National Library of Medicine
● Molly Cranston and Guillaume Wright, F1000Research
● Nick Everitt and Matthew Cannon, Taylor and Francis
● Scott Edmunds, GigaScience, Oxford University Press
● Simon Hodson, CODATA
● Theo Bloom, British Medical Journal
● Thomas Lemberger, EMBO Press
● Wei-Mun Chan, eLife
● Sowmya Swaminathan, Springer Nature
Current operational Team
● Allyson Lister, Content and Community Lead
● Milo Thurston, Technical Lead
● Ramon Granell, Data Enrichment & Quality Manager
● Delphine Dauga, Data Curator Manager
● Hiring in progress, Web Developer
● Dominique Batista, Research Software Engineer
● Philippe Rocca-Serra, Co-Founder
● Susanna-Assunta Sansone, PI and Founder
● and many collaborators and contributors!
Executive Advisors
● Varsha Khodiyar, Independent expert
● Chris Graf, Springer Nature
● David Carr, Independent expert
● Robert Hanisch, Director, NIST Office of Data & Informatics
● Peter McQuilton, FAIRsharing Founding Member, GSK
46. Motivations and ambitions
beyond the hype
Large body of generic FAIR
guidance
Motivations
Non-specific guidance for
the life sciences
Ambitions
Target specific situations to deliver a guide with
applied examples
Join academia and industry forces to make the
case for FAIR data management
Build capacity for high quality data
management in the private and public sectors
Lack of practical examples
of ‘how-to’ with different
data types and scenarios
47. FAIR-driven digital transformation by pharmas
● Biopharma R&D productivity can be improved
by implementing the FAIR Principles
● FAIR enables powerful new AI analytics to
access data for machine learning and prediction
● Requirements
○ financial, technical, training
● Challenges
○ change the culture, show business value,
achieve the ‘FAIR enough’ on an
48. The FAIR Cookbook
What is it?
An online, ‘live’ resource
for the life sciences
A collection of recipes
that cover the operation
steps of FAIR data
management
Who is it for?
Anyone who needs
practical assistance in
their FAIRification
journey or creates FAIR
guidance and
educational material
Who developed it?
Researchers and data
managers professionals
in the life sciences, from
academia and industry
Including ELIXIR members
https://faircookbook.elixir-europe.org
49. Learning objectives and who is the
Cookbook user?
Researchers,
Data Scientists,
Principal
Investigators
Data Managers,
Data Stewards,
Data Curators
Software
Developers,
Terminology
Managers
Policymakers,
Funders,
Trainers
Learn how to improve the FAIRness with
exemplar datasets
Understand the levels and indicators of
FAIRness
Discover open source technologies, tools
and services
Find out the required skills
Acknowledge the challenges
50. The Cookbook platform
open source community practices
Built using jupyter-book, following the practice used by
the Alan Turing Institute’ the Turing Way book, an open
source community-driven guide to reproducible, ethical,
inclusive and collaborative data science
Technology stack:
● github for version control and hosting
● Markdown the write-up
● HackMD markdown editor, integrated with github
● jupyter notebook for executable code
● binder for the web execution of jupyter notebook
distributed with a recipe
● mermaid javascript library for flowcharts, Gantt
charts and pie charts
https://the-turing-way.netlify.app/welcome
51. Current coordinators and editors
we are here to help
Content prioritisation
Identification of topics
Review of drafts
Call for contributions
Monthly book-dash
events
Pre-defined focus areas
Breakout on topics
Housekeeping
Technical platform
Website
Editorial Board Sections Board
The newly
developed
sections board
assists the
editorial board
Dominique Batista
Martin Cook
52. Contributors to date
Almost 100 life sciences professionals, researchers and data managers
FARIplus
partners
Industry
+
Academia
ELIXIR
Nodes
represented
53. ● Over 70 recipes released
and more content
available
● Covering technical
processes with
FAIRification examples in
the life sciences:
○ Omics
○ pre-clinical
○ clinical areas
But not limited to it!
A resource for FAIR doers
54. Content type overview
Recipes focused on
each technical aspects,
and applicable to any
data type
Recipes specific to a
topic or data type
59. Anatomy of a recipe
components
Ingredients
An idea of tools/skills needed
Step by step process
Guidelines, process, description
Practical
elements, code
snippets
#Python3
#zooma-annotator-script.py
file
def
get_annotations(propertyType
, propertyValues, filters = ""): "
Examples
Conclusions
What should I read next?
61. The FAIRness of the Cookbook
Accessibility
HTTPS protocol
Interoperability
JSON-LD markup
Cross-links to objects in other
registries
From the ELIXIR ecosystem and beyond!
CreDiT attribution ontology
Reusability
CC BY 4.0 license for all
content
Findability
Sitemap.xml, JSON-LD
Markup with Schema.org,
Bioschemas
w3id.org unique persistent
identifiers for each recipe
ORCID for authors
64. Tagging recipes with
‘Dataset Maturity Indicators’
Maturity level and indicators
new available feature
https://fairplus.github.io/Data-Maturity
Provide insights into FAIR Maturity reached by
applying a specific recipe to improve a dataset
67. What is missing?
Demonstrated benefits of FAIR for Reuse
• HPLC data fairification for AI: @Abbvie, Axel Wilbertz
• Image data annotation: @AZ, Alexander Buschle
• Integration of GWAS data: @BenevolentAI, Jia Li
• Biomap project: @UL, Irina Balaur, Soumyabrata Gosh
68. • Increasing number of (competing) semantic resources
• “Let 1000 flowers bloom” is nice but…
• Converging towards shared curation practice (precompetitive) and
efficient patterns
The challenge of interoperability: FAIR silos?
https://arxiv.org/pdf/2102.10062.pdf https://arxiv.org/pdf/2112.06567.pdf
69. Become part of a
community of FAIR experts!
1Identify a chapter and a topic
Findability Accessibility Interoperability Reusability
Infrastructure Applied examples Assessment
2 Choose a way of contributing and see our guidelines
Google Docs
HackMD
Git
Markdown cheat sheet
Get recipe template
Tips and tricks
Submit an
outline
3
You can
discuss it
with the
Editorial
Board
70. Start using the
FAIR Cookbook
Help us to help you,
this is a user-
oriented resource
Join a network
of FAIR experts
Share and broaden
your expertise,
joining our journey
Contribute
according to
your needs
Create and review,
or signpost gaps and
needs
Benefit from the
power of the
FAIR community
Become a recognised
expert, enlarge your
network
Recommend it
in your guidance
and training
material
For everyone
use it join it
adopted!
leverage it
The
roadmap,
your steps
71. Thanks to
Editorial Board
Section Editors
FAIRplus partners
All bookdashes’ participants
All authors
fairplus-cookbook@elixir-europe.org
faircookbook.elixir-europe.org
fairplus-project.eu This project has received funding from the Innovative Medicines Initiative Joint Undertaking under grant agreement No 802750. This Joint Undertaking receives
support from the European Union’s Horizon 2020 research and innovation programme and EFPIA Companies. This communication reflects the views of the
authors and neither IMI nor the European Union, EFPIA or any Associated Partners are liable for any use that may be made of the information contained herein.