Making Data FAIR (Findable, Accessible, Interoperable, Reusable)

Making Data FAIR*
Tom Plasterer, PhD
Director, Bioinformatics, Research Bioinformatics 20 Mar 2019
* Findable, Accessible, Interoperable and Reusable

3
What FAIR: Principles at-a-Glance
Findable:
• F1 (meta)data are assigned a globally
unique and persistent identifier
• F2 data are described with rich metadata
• F3 metadata clearly and explicitly include
the identifier of the data it describes
• F4 (meta)data are registered or indexed in a
searchable resource
The FAIR Guiding Principles for scientific data management and stewardship
Sci. Data 3:160018 doi: 10.1038/sdata.2016.18 (2016)
Accessible:
• A1 (meta)data are retrievable by their identifier
using a standardized communications protocol
• A1.1 the protocol is open, free, and universally
implementable
• A1.2 the protocol allows for an authentication and
authorization procedure, where necessary;
• A2 metadata are accessible, even when the data
are no longer available;
Interoperable:
• I1 (meta)data use a formal, accessible,
shared, and broadly applicable language for
knowledge representation
• I2 (meta)data use vocabularies that follow
FAIR principles
• I3 (meta)data include qualified references to
other (meta)data
Reusable:
• R1 meta(data) are richly described with a plurality
of accurate and relevant attributes
• R1.1 (meta)data are released with a clear and
accessible data usage license
• R1.2 (meta)data are associated with detailed
provenance
• R1.3 (meta)data meet domain-relevant
community standards

4
Collaborative & Competitive Intelligence:
• Who do we want to partner with? Are there complementary assets to our portfolio?
• What space is too crowded and not our area of expertise?
• Greenfield situations?
Mergers, Acquisitions, Partnerships:
• How do we efficiently and deeply absorb data generated elsewhere into our systems? How
do we efficiently share?
• Does this make a smaller biotech/start-up a more viable partner?
Improved Patient Care:
• Can we share data and outcomes more efficiently in complicated trial settings (basket trials,
adaptive trials) to better engage opinion leaders and foster dialog?
• Along with Differential Privacy approaches, can we have the broader research community
help mine our data?
• How do we best reuse Real World Evidence (RWE) data in the clinic and in trial design?
Data (Ir)-reproducibility:
• Can we make preclinical data (more)-reproducible?
• Can we utilize data credentialization? (thanks to Dan Crowther @ Exscientia)
Why FAIR: Biopharma Value Proposition

5
Why FAIR: €26bn Reasons…

6
When FAIR: A Brief History
Moving away from Narrative
• Nanopublications
Incubating Standards in Open PHACTS
• VoID, PROV-O
Lorentz Center Workshop
• FORCE 11 FAIR Guiding Principles
• Participants: IMI members, US researchers,
Content providers, ELIXIR; European Open
Science Cloud, Big Data to Knowledge (BD2K)
Current Status:
• FAIR Data Workshops (EU-ELIXIR nodes)
• Inclusion in Horizon 2020, NIH Advocacy
• IMI2 Data FAIR-ification Call
• Vendors getting up to speed

7
Linked Data Community of Practice
How familiar are you with the
FAIR principles and metrics?
When FAIR: Community Awareness

8
Linked Data Community of Practice
What is the maturity
level of your
organization with
respect to
implementation of
FAIR?
When FAIR: Getting Started

9
How FAIR: Pistoia FAIR Implementation Group
• Business challenge:
- Effective application and analysis of data
assets in life science industry demands that
it is made Findable, Accessible,
Interoperable and Reusable
• Update and plans:
- Workshop at The Hyve, Utrecht NL in June
2018 resulted in a published feature
article:-
- Workshop at EPAM, Boston US in Dec
2018 contributed to the business case
thinking
- Phase 1 for 2019 plans:-
• Develop the business case to define
distinctive role for the project
• Develop the FAIR Toolkit concept
• Select a use case: e.g. clinical science
to engage with CROs at a workshop
- Seeking more funding – join us!
PM: Ian Harrow Collaborators
1.Metric Tools & Best Practice
2.Training resources
3.Culture change process
4.Use case examples
5.Cost benefit examples
• Adapt for Life Science industry
• Leverage existing FAIR resources
FAIR Toolkit
Implementation
for LS Industry
FAIR

10
How FAIR: Pistoia Ontologies Mapping Project
• Business challenge:
– Use of different ontologies within
same data domain hampers
interoperability and application.
Solve by mapping between them.
• Update and plans:
– Phase 3 completed by end of 2018
• Predicted mappings delivered as a
prototype Ontology Mapping Service
for phenotype and disease domain
• Mappings will be available through
public wiki and OxO mapping repository
at EMBL-EBI
• Mapping algorithm, Paxo is available
openly on GitHub
– Phase 4 for 2019 plans:-
• To extend mapping of biological and
chemical ontologies for support of
laboratory analytics
• FAIR implementation is planned
– Seeking more funding – join us!
PartnersPM: Ian Harrow

12
How FAIR: Implementation Networks

13
How FAIR:
Overview:
• ELIXIR - Project Coordinator & Janssen - Project Leader
• 22 participants with 12 academic, 7 EFPIA, 3 SME
• €8.23M budget with €4M H2020 EC funding + €4.23M EFPIA in-kind
• 42 months
Goals:
• Establish a value-based process for prioritization and selection of IMI project databases
• Develop FAIRification toolkit e.g. develop guidelines, tools and metrics - FAIR Cookbook
• Apply this toolkit to FAIRify datasets from selected IMI projects and EFPIA companies
• Deliver training for data handlers (academia, SMEs and pharmaceuticals) to change and
sustain the data management culture
• Foster and innovation ecosystem on FAIR open data to power future reuse, knowledge
generation and societal benefit e.g. FAIR innovation and SME events
Members:
PM: Serena Scollen

17
Start FAIR: Find me Datasets about:
Projects
Study
Indication/
Disease
Technology
Targets
Cohort DatesAgent
Therapeutic
Area
Drugs

18
Dataset Catalog is a collection of Dataset Records
• Catalogs are needed to supporting FAIR (Findable) data
• Catalogs can and should support Enterprise MDM strategies
• Consumers can be internal or external
Dataset Catalogs are needed so data consumers can find Datasets
• Dataset records need sufficient metadata to support discoverability
• Dataset terms are NOT the data instance
Dataset Catalogs surface dataset provenance and enable data access
Dataset Catalogs can provide datasets for multiple consumption patters
• Analytics readiness and fit
• ‘Walking’ across information models
Start FAIR: Findability Starts with Catalogs

19
Start FAIR: A DCAT conformant Data Catalog
https://www.w3.org/TR/hcls-dataset/
https://www.w3.org/TR/vocab-dcat/#vocabulary-overview
Semantic tagging of datasets with
concepts from taxonomies:
• provides context
• multi-dimensional & flexible
• effective for discoverability
• light-weight semantics
skos:Concept
dcat:Catalog skos:ConceptScheme
dctypes:Dataset (summary)
dct:title
dct:publisher <foaf:Agent>
foaf:page
void:sparqlEndpoint
dct:accrualPeriodicity
dcat:keyword
dcat:dataset
dcat:theme
dctypes:Dataset (version)
dcat:Distribution
(dctypes:Dataset)
void:vocabulary
dct:conformsTo
void:exampleResource
…other void properties
dcat:distribution
dcat:themeTaxonomy
dct:isVersionOf
pav:previousVersion
dct:hasPart
pav:hasCurrentVersion
dct:hasPart
dct:title
pav:version
dct:creator <foaf:Agent>
dct:created
dct:source
dct:creator <foaf:Agent>
dct:license
dct:format
pav:retrievedFrom
dct:created
pav:createdWith
dcat:accessURL
dcat:downloadURL
void:Dataset
dct:title
dctDescription

Start FAIR: Dataset to Knowlege Graph to Analytics
Data Catalog Filter
Phase 1
Experiment Metadata Filter
Phase 2
Ad hoc Analyses Filtering
Phase 3
Outbound
to Data Analytics
Data Science
Tools
Statistical
Filtering
e.g., clinical trial with > 50
participants
Dataset
Catalog
Descriptions

R&D | RDI
Why FAIR?
• Cost avoidance, Business Advantage, Data Stewardship
When FAIR?
• Now! Peers, especially in Europe, are doing it
How FAIR?
• FAIRplus, GO-FAIR, Pistoia FAIR Implementation Group
Start FAIR
• Findability first, adopt a FAIR-compliant Data Catalog
FAIR-for-Biopharma: Take-aways

R&D | RDI
Thanks
Key Influencers
David Wood
Tim Berners-Lee
Lee Harland
Jane Lomax
James Malone
Dean Allemang
Barend Mons
Carole Goble
Bernadette Hyland
Bob Stanley
Eric Little
Michel Dumontier
John Wilbanks
Hans Constandt
Filip Pattyn
Tim Hoctor
Kees Van Boche
Serena Scollen
AstraZeneca/Pistoia FAIR
Data Community
Mathew Woodwark
Rajan Desai
Nic Sinibaldi
Chia-Chien Chiang
Kerstin Forsberg
Ola Engkvist
Ian Dix
Colin Wood
Ted Slater
Martin Romacker
Eric Neumann
John Wise
Carmen Nitsche
Ian Harrow
Jeff Saltzman
Kathy Reinold

Making Data FAIR (Findable, Accessible, Interoperable, Reusable)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Making Data FAIR (Findable, Accessible, Interoperable, Reusable)

Similar to Making Data FAIR (Findable, Accessible, Interoperable, Reusable) (20)

More from Tom Plasterer

More from Tom Plasterer (10)

Recently uploaded

Recently uploaded (20)

Making Data FAIR (Findable, Accessible, Interoperable, Reusable)

Editor's Notes