Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Let’s go on a FAIR safari!

COMBINE 2019, EU-STANDS4PM, Heidelberg, Germany 18 July 2019
FAIR: Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any other kind of Research Object one can think of, is now a mantra; a method; a meme; a myth; a mystery. FAIR is about supporting and tracking the flow and availability of data across research organisations and the portability and sustainability of processing methods to enable transparent and reproducible results. All this is within the context of a bottom up society of collaborating (or burdened?) scientists, a top down collective of compliance-focused funders and policy makers and an in-the-middle posse of e-infrastructure providers.

Making the FAIR principles a reality is tricky. They are aspirations not standards. They are multi-dimensional and dependent on context such as the sensitivity and availability of the data and methods. We already see a jungle of projects, initiatives and programmes wrestling with the challenges. FAIR efforts have particularly focused on the “last mile” – “FAIRifying” destination community archive repositories and measuring their “compliance” to FAIR metrics (or less controversially “indicators”). But what about FAIR at the first mile, at source and how do we help Alice and Bob with their (secure) data management? If we tackle the FAIR first and last mile, what about the FAIR middle? What about FAIR beyond just data – like exchanging and reusing pipelines for precision medicine?

Since 2008 the FAIRDOM collaboration [1] has worked on FAIR asset management and the development of a FAIR asset Commons for multi-partner researcher projects [2], initially in the Systems Biology field. Since 2016 we have been working with the BioCompute Object Partnership [3] on standardising computational records of HTS precision medicine pipelines.

So, using our FAIRDOM and BioCompute Object binoculars let’s go on a FAIR safari! Let’s peruse the ecosystem, observe the different herds and reflect what where we are for FAIR personalised medicine.


References
[1] http://www.fair-dom.org
[2] http://www.fairdomhub.org
[3] http://www.biocomputeobject.org

Let’s go on a FAIR safari!

  1. 1. Let’s go on a FAIR Safari! Prof Carole Goble The FAIRDOM Consortium ELIXIR UK Head of Node BioComputeObject Partnership The University of Manchester, UK carole.goble@manchester.ac.uk COMBINE 2019, EU-STANDS4PM, Heidelberg,Germany 18 July 2019
  2. 2. A European standardization framework for data integration and data-driven in silico models for personalised medicine harmonised transnational standards, recommendations and guidelines that allow a broad application of predictive in silico methodologies in personalised medicine across Europe.
  3. 3. A European standardization framework for data integration and data-driven in silico models for personalised medicine
  4. 4. Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18 A potted history Many went before 2014 - Lorentz workshop 2015 - BioHackathon 2016 - Published Went bananas Grassroots activity that has become a top down one.
  5. 5. sharing/publishing assets in public archives… Data Models *top three most popular The evolution of standards and data management practices in systems biology (2015). Stanford et al, Molecular Systems Biology, 11(12):851
  6. 6. … model reuse is tricky… Stanford et alThe evolution of standards and data management practices in systems biology, Molecular Systems Biology (2015) 11: 851 DOI 10.15252/msb.20156053 COMBINE sessions on Reproducibility
  7. 7. ... different repositories, owners, sovereignties, infrastructure, platforms … The evolution of standards and data management practices in systems biology (2015). Stanford et al, Molecular Systems Biology, 11(12):851 A jungle An ecosystem
  8. 8. http://genexplain.com/mypathsem/
  9. 9. FAIR for the widest possible use from EHR to Research …
  10. 10. Just how feasible is it to interoperate (integrate) and reuse data collected for another purpose in another domain?
  11. 11. The FAIR Jungle
  12. 12. The FAIR Hype Clarity Infrastructure Methodologies Incentives
  13. 13. Cutting a path through the jungle…. PEST – political, economic, social, technical What does it mean to be FAIR? What is the cost / benefit analysis
  14. 14. Lets examine FAIR more closely….
  15. 15. FAIR principles in the paper… some people seem to have taken as the law of the jungle…
  16. 16. FAIR Principles machine-actionable data and metadata Findable Accessible Interoperable Reusable Find: with machine readable metadata Locate and id: with standard identification mechanism Available and obtainable Human & machine Metadata always STANDARDS Semantically encoded, syntactically parsable References Sufficiently described Provenance Least restrictive licenses Community compliant Increase exchange, integration and reuse Across disciplines and borders
  17. 17. FAIR Principles reality check • An aspiration, a journey. • A call for machine actionability of data and metadata. • Ambiguous. • Work in progress. • A subset of indicators: – ROI, impact, community need, sustainability of repository, quality of service…. Are Are not • A standard. • Strict. • Just about humans being able to find, access, reformat and finally reuse data. • Technology specific. • Domain specific. • Tablets of stone Mons et al Cloudy, increasingly FAIR; Revisiting the FAIR Data guiding principles for the European Open ScienceCloud. Information Services & Use. 37. 1-8. 10.3233/ISU-170824. Dunning et al Are the FAIR Data Principles fair? IDCC17
  18. 18. Lets measure it! Framework for metrics Automated services Manual services Authorities Wilkinson et al, Evaluating FAIR MaturityThrough a Scalable, Automated,Community-Governed Framework https://doi.org/10.1101/649202
  19. 19. Lets measure it! Dunkelziffer “Not everything that can be counted counts. Not everything that counts can be counted” [William Bruce Cameron]
  20. 20. Compliance Awareness Expectation setting Self-evaluation Reporting By Providers, Users & Community Certification Judgement Regulation ByWhom??? Comparison Monitoring Review Quality By Community Contract http://blog.ukdataser vice.ac.uk/fair-data- assessment-tool/ https://fairshake.cloud/
  21. 21. Indicators: Robustness, Humility,Transparency, Diversity, Reflexivity* Context dependency Community standards Incremental Matrix of metrics Maturity levels for each + *The MetricTide, https://responsiblemetrics.org/the-metric-tide/ F and A are not so bad I and R are hard A FAIR Ecosystem means.... FAIR indicators, models and trust Transparent evaluation
  22. 22. Capability Maturity Model of entities & their capabilities Indicators and metrics measuring levels Foundational Components FAIRification Process Awareness and Policy Standards and Guidelines People Infrastructure Value Based Assessment Selection Goal Setting Process planning Modelling Transformation Publishing Impl. Outcome: Dataset Persistent Identification Data Set Discovery Machine Readability Data Access and Usage Preservation and Sustainability RDA FAIR Data Maturity Model Working Group Cataloguing the FAIR ecosystem
  23. 23. What do we mean by a Maturity Model? [Susheel Varma] Only way more elaborate ….
  24. 24. [Wilkinson et al, 2019] FAIR Evaluator Workflows Rubrics, Indicators andTests are FAIR objects and community decisions https://doi.org/10.1101/649202 Scale up and scale out automation of indicators and their evaluation…
  25. 25. FAIRification Pipelines Rare Diseasehttps://www.go-fair.org/fair- principles/fairification-process/ [Marco Roos]
  26. 26. FAIRification Cookbooks … for models? https://fairplus-project.eu/
  27. 27. More than just data Software, models, workflows, SOPs, Lab Protocols…. FAIR Digital Objects
  28. 28. FAIR Models properties of data + software FAIR Software FAIR Workflows* Maintainability Testing Portability Composite structure Forms (spec or code?) Versioning Executability Maturity models Contributor policy Identity Copyright Licenses Documentation Sustainability Model Reproducibility & Exchange *FAIR ComputationalWorkflows https://doi.org/10.5281/zenodo.3268653
  29. 29. FAIR Precision Medicine Models … Indicators & Maturity Model eXchange
  30. 30. Standards Identifiers AAI & Licensing Repositories Search DMP Policies Governance Cloud of registries Federation
  31. 31. Scale out mark-up for federation https://eosc-edmi.github.io/ http://bioschemas.org EOSC Dataset Minimum Information
  32. 32. FAIR OPEN SAFE privacy preservation, regulatory rigour crossing domain and sovereignty boundaries
  33. 33. Privacy Preservation of data data book keeping https://f1000research.com/posters/7-1036 https://www.monarc.lu [Pinar Alper]
  34. 34. Privacy Preservation of analysis take (distributed) analysis to the (distributed) data https://www.health-ri.org/ Personal Health Train Collect privacy sensitive data using mobile containers
  35. 35. Regulatory Practice robust, safe exchange and reuse of HTS computational analytical workflows http://biocomputeobject.org IEEE P2791 BioCompute Working Group [Vahan Simonyan]
  36. 36. BioCompute Framework to advance Regulatory Science to support NGS analysis Emphasis on robust, safe reuse. Describe and validate the metadata of packages, and their contents, both inside and outside Standardise data formats and elements and exchange of Electronic Health Records Describe and validate analysis workflows, to be portable and interoperable Standardise and support sharing and analysis of Genomic data Alterovitz, Dean II, Goble, Crusoe, Soiland-Reyes et al “Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results” PLOS Biology 2018
  37. 37. Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004 Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/ Self-describing machine processable metadata in common and specific to different object types. bundle together references or the objects themselves. Relate digital resources snapshot | cite | exchange Research Object Framework
  38. 38. COMBINE was early to the party…. Combine Archive Scharm M,Wendland F, Peters M,Wolfien M,TheileT,Waltemath D SEMS, University of Rostock zip-like file with a manifest & metadata - Bundle files - Keep provenance - Exchange data - Ship results Bergmann, F.T. (2014). COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project. BMC bioinformatics,15(1), 1. https://sems.unirostock.de/projects/combinearchive/
  39. 39. Big data distributed over multiple locations, Efficiently and safely moved on demand ROs are verified collections of references [Chard, et al 2016] FAIR Research Objects
  40. 40. The KnowledgeObject Reference Ontology (KORO): A formalism to support management and sharing of computable biomedical knowledge for learning health systems Flynn, Friedman, Boisvert, Landis‐Lewis, Lagoze (2018), https://doi.org/10.1002/lrh2.10054 Graphs of ROs Track ROs Combine and enrich ROs Learning Health Systems and Research Objects
  41. 41. EOSC-Life: FAIR data and tools (workflows, models) for cloud use RI data (distributed over facilities) Ecosystem of innovative tools in EOSC Publish FAIR life science data in EOSC Data Catalogues Tools Catalogues Workflow Catalogues Service Catalogues [Niklas Blomberg]
  42. 42. FAIR Challenges for Projects Track collection of data and metadata X X X Maintain experimental context X X Find and exchange assets X X X X Long-term retain results beyond a project X X X Share, disseminate and publish assets sensitively X X X Consistently report for interpretation, interoperability & comparison X X Promote standardised metadata practices. X X Organise and link assets X X X Reuse tools and community archives X X Integrate with other data stores and platforms X X X Support reproducible publications X X X X Credit owners X X
  43. 43. Public Project Commons Platform Service hosted at HITS 50+ installations 140+ projects Support
  44. 44. A Commons Project Investigations and Assets Simulate model Launch workflows
  45. 45. Models SOPs People Projects Publications Documents Presentations Workflows Data Events Federated Catalogue, Integrated view interlinked objects, structured organisation, resources ecosystem Investigations Studies Assays/Analyses
  46. 46. Workflows Federated Catalogue, Integrated view interlinked objects, structured organisation, resources ecosystem Stores Archives FAIR Membrane Investigations Studies Assays/Analyses
  47. 47. Federated Catalogue, Integrated view A Commons is only as FAIR as the content (inside and outside) FAIR Membrane
  48. 48. FAIR(ish) after death …. https://fairdom hub.org/projec ts/129 https://wellcomeopenresearch.org/articles/4-104/v1 Zielinski, Hay, Millar, The grant is dead, long live the data - migration as a pragmatic exit strategy for research data preservation,
  49. 49. Data Sovereignty: FAIR but not yet Open A Project Commons not an integrated data warehouse
  50. 50. e.g. (Pillar III) in-house in-house All LiSyM Patient-related clinical data Aggregated data API External Tools API Data Sovereignty: FAIR but never Open [Mueller]
  51. 51. Data Sovereignty: Personal Health Tram Less automatic, more transparent, when partners cannot share Share table structure Share common code Share summaries
  52. 52. FAIR at the First Mile [Christian R Bauer]
  53. 53. FAIR at the First Mile Project Commons Integrated Data Warehouse[Christian R Bauer]
  54. 54. EU-STAND4PM: First and Last Mile Neylon, Knowledge Exchange Report: http://www.knowledge-exchange.info/event/ke-approach-open-scholarship FAIR at last mile FAIR at first mile / source FAIR Protected Data/Compute FAIR Objects FAIRification
  55. 55. EU-STANDS4PM FAIR path through the jungle Indicators and Maturity Models obtainable & understandable Technical infrastructure & Stewardship Skills possible Communities & Culture easy (or at least feasible) User Experience normative rewarding Incentives required Policies Based on Matt Spritzer’s figure, COS
  56. 56. Acknowledgements FAIRDOM Team – http://www.fair-dom.org Research Object Team – http://www.researchobject.org BioComputeObject – http://biocomputeobject.org/ FAIR folks, esp. FAIRplus and FAIR Metrics – https://fairplus-project.eu/ – http://www.fairmetrics.org CommonWorkflow Language – http://www.commonwl.org ELIXIR – http://www.elixir-europe.org BioExcel – http://bioexcel.eu
  57. 57. Acknowledgements

×