COMBINE 2019, EU-STANDS4PM, Heidelberg, Germany 18 July 2019
FAIR: Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any other kind of Research Object one can think of, is now a mantra; a method; a meme; a myth; a mystery. FAIR is about supporting and tracking the flow and availability of data across research organisations and the portability and sustainability of processing methods to enable transparent and reproducible results. All this is within the context of a bottom up society of collaborating (or burdened?) scientists, a top down collective of compliance-focused funders and policy makers and an in-the-middle posse of e-infrastructure providers.
Making the FAIR principles a reality is tricky. They are aspirations not standards. They are multi-dimensional and dependent on context such as the sensitivity and availability of the data and methods. We already see a jungle of projects, initiatives and programmes wrestling with the challenges. FAIR efforts have particularly focused on the “last mile” – “FAIRifying” destination community archive repositories and measuring their “compliance” to FAIR metrics (or less controversially “indicators”). But what about FAIR at the first mile, at source and how do we help Alice and Bob with their (secure) data management? If we tackle the FAIR first and last mile, what about the FAIR middle? What about FAIR beyond just data – like exchanging and reusing pipelines for precision medicine?
Since 2008 the FAIRDOM collaboration [1] has worked on FAIR asset management and the development of a FAIR asset Commons for multi-partner researcher projects [2], initially in the Systems Biology field. Since 2016 we have been working with the BioCompute Object Partnership [3] on standardising computational records of HTS precision medicine pipelines.
So, using our FAIRDOM and BioCompute Object binoculars let’s go on a FAIR safari! Let’s peruse the ecosystem, observe the different herds and reflect what where we are for FAIR personalised medicine.
References
[1] http://www.fair-dom.org
[2] http://www.fairdomhub.org
[3] http://www.biocomputeobject.org
1. Let’s go on a
FAIR Safari!
Prof Carole Goble
The FAIRDOM Consortium
ELIXIR UK Head of Node
BioComputeObject Partnership
The University of Manchester, UK
carole.goble@manchester.ac.uk
COMBINE 2019, EU-STANDS4PM, Heidelberg,Germany 18 July 2019
2. A European standardization framework for data
integration and data-driven in silico models for
personalised medicine
harmonised transnational standards, recommendations
and guidelines that allow a broad application of
predictive in silico methodologies in personalised
medicine across Europe.
3. A European standardization framework for data
integration and data-driven in silico models for
personalised medicine
4. Scientific Data 3, 160018 (2016)
doi:10.1038/sdata.2016.18
A potted history
Many went before
2014 - Lorentz workshop
2015 - BioHackathon
2016 - Published
Went bananas
Grassroots activity that has
become a top down one.
5. sharing/publishing assets in public archives…
Data Models
*top three most popular
The evolution of standards and data management practices in systems biology
(2015). Stanford et al, Molecular Systems Biology, 11(12):851
6. … model reuse is tricky…
Stanford et alThe evolution of standards and data management practices in systems biology,
Molecular Systems Biology (2015) 11: 851 DOI 10.15252/msb.20156053
COMBINE sessions on Reproducibility
7. ... different repositories, owners,
sovereignties, infrastructure, platforms …
The evolution of standards and data management practices in systems biology (2015).
Stanford et al, Molecular Systems Biology, 11(12):851
A jungle
An ecosystem
13. Cutting a path through the jungle….
PEST – political, economic, social, technical
What does it mean to be FAIR?
What is the cost / benefit analysis
15. FAIR principles in the paper…
some people seem to have taken as the law of the jungle…
16. FAIR Principles
machine-actionable data and metadata
Findable Accessible Interoperable Reusable
Find: with
machine
readable
metadata
Locate and id:
with standard
identification
mechanism
Available and
obtainable
Human &
machine
Metadata
always
STANDARDS
Semantically
encoded,
syntactically
parsable
References
Sufficiently
described
Provenance
Least restrictive
licenses
Community
compliant
Increase exchange, integration and reuse
Across disciplines and borders
17. FAIR Principles reality check
• An aspiration, a journey.
• A call for machine actionability
of data and metadata.
• Ambiguous.
• Work in progress.
• A subset of indicators:
– ROI, impact, community
need, sustainability of
repository, quality of
service….
Are Are not
• A standard.
• Strict.
• Just about humans being able to
find, access, reformat and finally
reuse data.
• Technology specific.
• Domain specific.
• Tablets of stone
Mons et al Cloudy, increasingly FAIR; Revisiting the FAIR Data guiding principles for the European Open
ScienceCloud. Information Services & Use. 37. 1-8. 10.3233/ISU-170824.
Dunning et al Are the FAIR Data Principles fair? IDCC17
18. Lets measure it!
Framework for metrics
Automated services
Manual services
Authorities
Wilkinson et al, Evaluating FAIR MaturityThrough a Scalable, Automated,Community-Governed Framework
https://doi.org/10.1101/649202
19. Lets measure it!
Dunkelziffer
“Not everything that
can be counted counts.
Not everything that
counts can be counted”
[William Bruce Cameron]
21. Indicators: Robustness,
Humility,Transparency,
Diversity, Reflexivity*
Context dependency
Community standards
Incremental
Matrix of metrics
Maturity levels for each
+
*The MetricTide, https://responsiblemetrics.org/the-metric-tide/
F and A are not so bad
I and R are hard
A FAIR Ecosystem means....
FAIR indicators, models and trust
Transparent
evaluation
22. Capability Maturity Model
of entities & their capabilities
Indicators and metrics
measuring levels
Foundational
Components
FAIRification
Process
Awareness and Policy
Standards and Guidelines
People
Infrastructure
Value Based Assessment
Selection
Goal Setting
Process planning
Modelling
Transformation
Publishing
Impl. Outcome:
Dataset
Persistent Identification
Data Set Discovery
Machine Readability
Data Access and Usage
Preservation and Sustainability
RDA FAIR Data
Maturity Model
Working Group
Cataloguing the FAIR ecosystem
23. What do we mean by a Maturity Model?
[Susheel Varma]
Only way more
elaborate ….
24. [Wilkinson et al, 2019]
FAIR Evaluator Workflows
Rubrics, Indicators andTests
are FAIR objects and community decisions
https://doi.org/10.1101/649202
Scale up and scale out
automation of
indicators and their
evaluation…
33. Privacy Preservation of data
data book keeping
https://f1000research.com/posters/7-1036
https://www.monarc.lu
[Pinar Alper]
34. Privacy Preservation of analysis
take (distributed) analysis to the (distributed) data
https://www.health-ri.org/
Personal Health Train
Collect privacy sensitive data using mobile containers
35. Regulatory Practice
robust, safe exchange and reuse of
HTS computational analytical
workflows
http://biocomputeobject.org
IEEE P2791
BioCompute
Working Group
[Vahan Simonyan]
36. BioCompute Framework
to advance Regulatory Science to support NGS analysis
Emphasis on robust, safe reuse.
Describe and validate the
metadata of packages, and
their contents, both inside
and outside
Standardise data formats and
elements and exchange of
Electronic Health Records
Describe and
validate analysis
workflows, to be
portable and
interoperable
Standardise and support
sharing and analysis of
Genomic data
Alterovitz, Dean II, Goble, Crusoe, Soiland-Reyes et al “Enabling
Precision Medicine via standard communication of NGS provenance,
analysis, and results” PLOS Biology 2018
37. Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/
Self-describing machine processable
metadata in common and specific to
different object types.
bundle together references or
the objects themselves. Relate
digital resources
snapshot | cite | exchange
Research Object
Framework
38. COMBINE was early to the party….
Combine Archive
Scharm M,Wendland F, Peters M,Wolfien M,TheileT,Waltemath D SEMS, University of Rostock zip-like file with a manifest & metadata
- Bundle files - Keep provenance
- Exchange data - Ship results
Bergmann, F.T. (2014). COMBINE archive and OMEX format: one file to share all information
to reproduce a modeling project. BMC bioinformatics,15(1), 1.
https://sems.unirostock.de/projects/combinearchive/
39. Big data distributed over multiple locations,
Efficiently and safely moved on demand
ROs are verified collections of references
[Chard, et al 2016]
FAIR Research Objects
40. The KnowledgeObject Reference Ontology (KORO): A formalism to support management and sharing of computable
biomedical knowledge for learning health systems
Flynn, Friedman, Boisvert, Landis‐Lewis, Lagoze (2018), https://doi.org/10.1002/lrh2.10054
Graphs of ROs
Track ROs
Combine and enrich ROs
Learning Health Systems
and Research Objects
41. EOSC-Life: FAIR data and tools (workflows, models) for
cloud use
RI data (distributed over
facilities)
Ecosystem of innovative
tools in EOSC
Publish FAIR life
science data in EOSC
Data Catalogues
Tools Catalogues
Workflow Catalogues
Service Catalogues
[Niklas Blomberg]
42.
43. FAIR Challenges
for Projects
Track collection of data and metadata X X X
Maintain experimental context X X
Find and exchange assets X X X X
Long-term retain results beyond a project X X X
Share, disseminate and publish assets sensitively X X X
Consistently report for interpretation, interoperability
& comparison
X X
Promote standardised metadata practices. X X
Organise and link assets X X X
Reuse tools and community archives X X
Integrate with other data stores and platforms X X X
Support reproducible publications X X X X
Credit owners X X
49. FAIR(ish) after death ….
https://fairdom
hub.org/projec
ts/129
https://wellcomeopenresearch.org/articles/4-104/v1
Zielinski, Hay, Millar, The grant is dead, long live the data - migration as a pragmatic exit strategy
for research data preservation,
50. Data Sovereignty: FAIR but not yet Open
A Project
Commons
not an
integrated
data
warehouse
51. e.g. (Pillar III)
in-house in-house
All LiSyM
Patient-related
clinical data
Aggregated data
API
External Tools
API
Data Sovereignty: FAIR but never Open
[Mueller]
52. Data Sovereignty: Personal Health Tram
Less automatic, more transparent, when partners cannot share
Share table structure
Share common code
Share summaries
54. FAIR at the First Mile
Project Commons Integrated Data Warehouse[Christian R Bauer]
55. EU-STAND4PM: First and Last Mile
Neylon, Knowledge Exchange Report: http://www.knowledge-exchange.info/event/ke-approach-open-scholarship
FAIR at last mile
FAIR at first
mile / source
FAIR Protected
Data/Compute
FAIR
Objects
FAIRification
56. EU-STANDS4PM
FAIR path through the jungle
Indicators and Maturity Models
obtainable & understandable
Technical infrastructure & Stewardship Skills
possible
Communities & Culture
easy (or at least feasible)
User Experience
normative
rewarding
Incentives
required
Policies
Based on Matt Spritzer’s figure, COS
57. Acknowledgements
FAIRDOM Team
– http://www.fair-dom.org
Research Object Team
– http://www.researchobject.org
BioComputeObject
– http://biocomputeobject.org/
FAIR folks, esp. FAIRplus and FAIR Metrics
– https://fairplus-project.eu/
– http://www.fairmetrics.org
CommonWorkflow Language
– http://www.commonwl.org
ELIXIR
– http://www.elixir-europe.org
BioExcel
– http://bioexcel.eu