Advertisement

FAIRy stories: tales from building the FAIR Research Commons

Sep. 2, 2019
Advertisement

More Related Content

Slideshows for you(20)

Advertisement
Advertisement

FAIRy stories: tales from building the FAIR Research Commons

  1. FAIRy stories Tales from building the FAIR Research Commons Carole Goble The University of Manchester, UK ELIXIR UK Head of Node FAIRDOM Coordinator Software Sustainability Institute UK carole.goble@manchester.ac.uk INCF Neuroinformatics 2019, Warsaw, September 1-2, 2019
  2. FAIR Guiding Principles for Scientific Data Management and Stewardship, Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
  3. A Digital Object Research Commons organising DOs for a field and across fields. A “shared space” where investigators can store, share, access, connect and interact with digital objects generated from research, and use them. Not a Database or Data warehouse. repositories zoo registries zoo https://medium.com/@rgrossman1/a-proposed-end-to-end-principle-for-data-commons-5872f2fa8a47 [Bob Grossman, 2018]
  4. A Digital Object Research Commons organising DOs for a field and across fields. A “shared space” where investigators can store, share, access, connect and interact with digital objects generated from research, and do more data-intensive research. Not a Database or Data warehouse. Ecosystem of pooled community resources Federation with many entry points Collectively created, owned or shared by community Mixed degrees of control
  5. [Ian Fore, 2019] We are all trying to build A FAIR Research Commons
  6. We are all trying to build A FAIR Research Commons
  7. We are all trying to build A FAIR Research Commons
  8. An “ad hoc” commons “in the Wild” Using FAIR as a general principle Fragmented ecosystem of pooled community resources Distributed federation with many entry points and many providers Each has its APIs, Web interfaces, Data Submission,Tool deployment 23 countries 15 communities Including health Held together with standards, metadata mark-up, common identifiers, registries, workflows, shared vision, hard work, love and hope. National datasets Community, Public datasets http://elixir-europe.org
  9. Uber FAIR Life Science Commons Federation over an ecosystem of different fields Ecosystem of FAIR innovative tools Publish FAIR life science data A zoo of Catalogues of tools, data, workflows, computing resources …
  10. Our first FAIRy tale: Finding* stuff in a pre-existing ecosystem EOSC Dataset Minimum Information https://eosc-edmi.github.io/ Minimum information metadata guideline to find and access datasets reusing existing data models and interfaces. Conventions for using schema.org Find, Access and Index Google Dataset Search Small, Lightweight, Viral A little bit of Semantics everywhere *and a bit of provenance, licencing
  11. Our first FAIRy tale: Finding* stuff in a pre-existing ecosystem Structured data descriptors in web pages Low barrier universal mark-up Harvesting, indexing, search Exchange & register without API Automated curation A little bit of Semantics everywhere *and a bit of provenance, licencing
  12. Our first FAIRy tale: Finding* stuff in a pre-existing ecosystem A little bit of Semantics everywhere The Goldilocks Principle
  13. Scale out mark-up for a federation Dataset Properties 91 -> 5 + 8
  14. Data Exchange: Without an API MarRef → BioSamples https://github.com/EBIBioSamples/bioschemas_marref_demo/blob/master/Summary.md Bioschemas markup added to MarRef pages Markup crawled using BuzzBang Data included as a BioSample Curation
  15. A happy ending approaches Endorsed by ELIXIR First types -> Schema.org Goldilocks • Esp. good for small data providers • Types & Profiles debates/explosion • Domain ontology reuse challenges • Elegance vs best for tools • Trolls Community based demonstration (Toxicology, Rare Disease) Validation, mark-up & harvesting tools A subset of the FAIR Principles
  16. Is your resource FAIR? Is your data/workflow/model FAIR from first to last? The FAIR Data Principles vs FAIR the Nice Intention
  17. 2014 - Lorentz workshop 2015 - BioHackathon 2016 - Published Grassroots activity that has become a top down one. Many efforts before…. Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18 2nd Story: FAIR. Once Upon A Time…
  18. https://www.incf.org/activities/standards-and-best-practices/what-is-fair The FAIR principles in the paper… actually only in a break out box
  19. https://www.incf.org/activities/standards-and-best-practices/what-is-fair Machine and human readable data formats and metadata that is compliant to many community standards, that persists, and tells you the provenance of the data and how its cross-linked Data and metadata are locatable and accessible by GUIDs, standard access protocols and have the least restrictive licenses
  20. Access Reproducibility Automation Policy Practice Proclamation “enhancing the ability of machines to automatically find and use data or any digital object, and support its reuse by individuals” FAIR Principles more than a fuzzy feeling
  21. The message spread across the lands….
  22. The message spread across the lands….
  23. Simple words are powerful things that can be mangled. Simple concepts are not so simple to implement. Once size does not fit all. Beware FAIR zealots and vested interests.
  24. We { are | will be | always have been } FAIR Use our platform /technology to be FAIR. Even if its not what FAIR meant Only we control FAIR. Our way is the right way. We don’t know what it means to implement FAIR but we want to measure and certify it.
  25. “FAIR principles: interpretations and implementation considerations” J Data Intelligence, coming soon in 2019…. which was still contentious
  26. FAIRy tale -> Reality! • An aspiration, a journey. • A call for machine actionability. of data and metadata. • Ambiguous. • A spectrum. • Domain respectful. • Implementable with todays protocols and standards. • A subset of indicators: – ROI cost/benefit, impact, community need, sustainability of repository, quality of content/service…. • Work in progress. Principles are… Principles are not… • A standard. • Just about humans. • Strict. • Technology specific. • Only for one domain. • About inventing new protocols. • One size fits all. • Anything to do with quality. • Synonymous with open. • Tablets of stone. • Mons et al Cloudy, increasingly FAIR; Revisiting the FAIR Data guiding principles for the European Open Science Cloud. Information Services & Use. 37. 1-8. 10.3233/ISU-170824. • Dunning et al Are the FAIR Data Principles fair? IDCC17
  27. FAIRy Stories about FAIR • Its not about Open • Its not about a resource’s Quality or Impact • Its not actually about Harmonising all metadata to one schema.
  28. The FAIR Hype Clarity Infrastructure Methodologies Incentives Cost/benefit analysis
  29. FAIR is a Journey…. Concepts for FAIR Impleme ntation FAIR Culture FAIR Ecosystem Skills for FAIR Incentives and Metrics Invest ment in FAIR Turning FAIR into Reality, EC Report, 2018
  30. Review Criteria for Endorsement of Standards and Best Practices, 2018 DOI: 10.5281/zenodo.2535741 Subset of principles applied to standards and best practices The INCF Commons and its Resources themselves? “INCF supports the FAIR (Findable, Accessible, Interoperable, Reusable) principles, and adherence to them is a requirement for an INCF-endorsed standard or best practice.” https://www.incf.org/activities/standards-and-best-practices
  31. Defining and Implementing FAIR Clarity Metrics / Indicators Maturity Models Manual / Automated Assessment FAIRification Methodologies • At the first mile • At the last mile • For the legacy Toolkits,Tools and Services
  32. Compliance Awareness Expectation setting Self-evaluation Reporting Certification Endorsement Judgement Regulation Comparison Monitoring Review Quality Contract By Providers, Users & Community By Community By ???
  33. https://fairshake.cloud/ http://blog.ukdataservice.ac.uk/fair-data-assessment-tool/ https://www.howfairismydata.com/ Unhappy ending • Subjective • Hard to interpret and compare • Weak transparency • Judgemental • Drift to Quality review • Independent of the community • Occasionally barking mad Community pushback
  34. https://fairshake.cloud/ http://blog.ukdataservice.ac.uk/fair-data-assessment-tool/ https://www.howfairismydata.com/ Dunkelziffer “Not everything that can be counted counts. Not everything that counts can be counted” [William Bruce Cameron] “FAIR is non-trivial, and domain specific at anything other than the most superficial level” Wilkinson
  35. Matrix of indicators Maturity levels for each + *The MetricTide, https://responsiblemetrics.org/the-metric-tide/ A FAIR Assessment Transparent evaluation What, Who, How Objective evaluations Narrative feedback on fails Indicators Robustness, Humility, Transparency, Diversity, Reflexivity* Context Community standards Incremental Cost/benefit Not just a score Non judgmental Scope for novelty Transparent evaluation Eat the Dog Food Design-Build-Test-Learn indicators and evaluation
  36. Maturity Model Value Based Assessment Selection Goal Setting Process planning Modelling Transformation Publishing [Susheel Varma] A FAIR Assessment
  37. Capability Maturity Model of entities & their capabilities Indicators and metrics measuring levels Foundational Components FAIRification Process Awareness and Policy Standards and Guidelines People Infrastructure Value Based Assessment Selection Goal Setting Process planning Modelling Transformation Publishing Impl.Outcome: Dataset Persistent Identification Data Set Discovery Machine Readability DataAccess and Usage Preservation and Sustainability FAIR Data Maturity ModelWG A FAIR Assessment [Oya Deniz Beyan, 2019]
  38. Next meeting 12th September 2019 Sessions at Helsinki RDA Plenary October 2019
  39. Licence Metadata includes information about the licence under which the data can be reusedMandatory Metadata includes licence information in the appropriate element of the metadata standard used Metadata refers to a standard reuse licenceRecommended Metadata includes information about consent for reuse (e.g. personal data) Metadata refers to a machine-understandable reuse licenceOptional FAIR Data Maturity ModelWG An “easy” indicator…. “R1.1. (meta)data are released with a clear and accessible data usage license” Format Allows -- -- -- non-standard human readable access standard open standard reuse & machine readable clear reuse criteria “ “ “ “ “ “ “
  40. A trickier indicator… “R1.3. (meta)data meet domain-relevant community standards”FAIR Data Maturity ModelWG Mandatory Recommended Metadata complies with a community standard Data complies with a community standard Metadata is expressed in compliance with a machine-understandable community standard Data is expressed in compliance with a machine-understandable community standard Neuroshapes Metadata Portal Reviewers Suppose there isn’t a standard or its not up to it? Indicators have to be community specific Librarian’s view point vs Genomics view point? How is it validated? JSON and SHACL validators. How is it captured? Spreadsheets. Interoperability is nearly always purpose specific
  41. • Community governed “indicators” not metrics • Automated objective scale up & out • Sanity check put into practice https://fairsharing.github.io/FAIR-Evaluator-FrontEnd/#!/ Community creates Maturity Indicators Registered, Collections Compliance tests written, registered Resource tested from a starting identifier Report, (Registered) Wilkinson et al “Evaluating FAIR Maturity Through a Scalable, Automated, Community- Governed Framework” bioRxiv, https://doi.org/10.1101/649202 , 2019
  42. “FAIRification” (of legacy datasets) the new magic wand word • Need to do at the same time as define indicators • Needs experts • BYODs • ROI cost/benefit step • Muddle with harmonisation pipelines (compliance to I and R) • Non-trivial • Upstream • Turning into a business https://fairplus-project.eu/ https://www.go-fair.org/fair-principles/fairification-process/
  43. FAIR needs to be at the “first mile”, embedded into investigator practice. Mark Wilkinson Just saying you are FAIR doesn’t make it true. Its uneven and multi-facetted. Identifier use is chaotic. Separating metadata and data is problematic. FAIRification is non-trivial. FAIR is a set of behaviours not a specific technology
  44. Commons for autonomous, self-managing Sys Bio projects Hubs for Projects, People, Data, Models, SOPs, Workflows, Samples First Mile / Last Mile From the infrastructure / standard / commons / database / tool / * To the actual investigator fair-dom.org, fairdomhub.org
  45. I3: references between (meta)data Models + Data + Methods Respect and bridge the ecosystem federated catalogue, integrated context
  46. Respect and bridge the ecosystem federated catalogue, integrated context Public database Local store National infrastructure Secure store Public model repository Github Shared SOPs
  47. Neylon, Knowledge Exchange Report: http://www.knowledge-exchange.info/event/ke-approach-open-scholarship Respect and bridge the ecosystem going the first mile, and the last mile* A miracle of sweat and tears here different scales, different agendas, different incentives Koureas, The ‘last mile’ challenge for European research e-infrastructures https://riojournal.com/article/9933/ New ELIXIR Converge project
  48. [Maryann Martone]
  49. TheTragedy of the FAIR Commons* • A Commons is only a FAIR as its tenants • Project sovereignty • Public good vs personal burden • Professional Stewardship for Projects • Community socialisation and values Nudging *Mark Musen , https://ncip.nci.nih.gov/blog/face-new-tragedy-commons-remedy-better-metadata/ Based on Matt Spritzer / Brian Nosek figure, COS
  50. More than just data Software, models, workflows, SOPs, Lab Protocols…. 4th (and Last story): FAIR Digital Objects
  51. FAIR Workflows Commons Workflow management system (and registry) zoo* *https://s.apache.org/existing-workflow-systems
  52. FAIR Computational Workflows The point of FAIR (meta)data was to be machine actionable….. and even better if machine generated. • Operate in FAIR not proprietary formats • Support propagation of identifiers, licenses, and AAI • Mint FAIR identifiers, track data provenance, license end products Goble et al 2019 FAIR ComputationalWorkflows https://doi.org/10.5281/zenodo.3268653
  53. FAIR workflows in their own right. Like Software: Principles stretched Versioning Software maturity, quality, maintainability, documentation practices Goble et al 2019 FAIR ComputationalWorkflows https://doi.org/10.5281/zenodo.3268653
  54. FAIR workflows in their own right. Like Data: We can give them machine actionable metadata. Goble et al 2019 FAIR ComputationalWorkflows https://doi.org/10.5281/zenodo.3268653 Describes workflows to be portable, scalable & interoperable with different workflow systems and containerised tools Bundles descriptions, references, files Adds context, provenance, examples, data … Relates to data collections, SOPs, lab protocols… Links CWL descriptions with native workflows
  55. Regulatory Practice robust, safe exchange and reuse of HTS computational analytical workflows http://biocomputeobject.org IEEE P2791 BioComputeWorking Group [Vahan Simonyan] Alterovitz, Dean II,Goble,Crusoe, Soiland-Reyes et al “Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results” PLOS Biology 2018, https://doi.org/10.1371/journal.pbio.3000099
  56. A happy ending? • FAIR is work in progress! • Keep grounded, developer friendly and community supported • No-one reads specs. Everyone copies examples. • Nipype CWL is coming! MG-RAST/EBI MGnify Design by workflow blocks Pipeline versions comparison Pipeline exchange Recycling tool descriptions and sub-workflows
  57. What is FAIR, what should be FAIR and how to implement it is not simple. Its not just Good Intentions A social story, not a technical one. Without incentives, cultural normalisation and long term investment it will be a just a story. INCF’s FAIR Journey….
  58. Acknowledgements Ian Fore Mark Wilkinson Susanna Sansone Stian Soiland-Reyes Rob Grossman Barend Mons Nick Juty Alasdair Gray Rafael Jimenez Michel Dumontier Michael Crusoe Ian Cottam And all the projects and many more

Editor's Notes

  1. https://www.neuroinformatics2019.org Title: FAIRy stories: tales from building the FAIR Research Commons Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any kind of Research Object is a mantra; a method; a meme; a myth; a mystery. For the past 15 years I have been working on FAIR in a range of projects and initiatives in the Life Sciences as we try to build the FAIR Research Commons. Some are top-down like the European Research Infrastructures ELIXIR, ISBE and IBISBA, and the NIH Data Commons. Some are bottom-up, supporting FAIR for investigator-led projects (FAIRDOM), biodiversity analytics (BioVel), and FAIR drug discovery (Open PHACTS, FAIRplus). Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. There are villains and heroes. Some have happy endings; all have morals.
  2. FAIR was on the opening slides of the meeting Maryann Martone is an author along with me
  3. “Cyberinfrastructure that collocates data, storage, and computing infrastructure with commonly used tools for analyzing and sharing data to create an interoperable resource for the research community.” (Open Commons Consortium) “An environment where participants make use of computing and communication technologies to access shared instruments and data, as well as to communicate with others” (Wikipedia) a database organizes data for a project; a data warehouse organizes data for an organization; and a data commons organizes data for a field or discipline. (Bob Grossman)
  4. https://www.humanbrainproject.eu/en/explore-the-brain/ And the HPB Collaboratory
  5. Incrementing Interop – services, standards, know-how Stuff is massive legacy No one governance
  6. 13 Ris Almost like a Meta-Commons
  7. 91 properties for dataset Bioschema’s dataset Compliant with Google Dataset Profile 5 minimal properties 8 recommended properties Link to DataCatalog Link to DataDownload
  8. Bioschemas markup added to MarRef pages Markup crawled using BuzzBang Data included as a BioSample Curation Depicted by the External Links
  9. Villains and Heroes
  10. Its is context dependent - fair for a library not for plant sciences. Though it all helps! Though links to other metadata help, but they may not be harmonised Its about identifying and describing stuff.
  11. Subset of the FAIR principles BIDS, NeuroML and PyNN are endorsed (https://www.incf.org/resources/incf-endorsed-standards-best-practices) https://www.incf.org/resources/other-standards-best-practices
  12. Beware… beauty is in the eye of the beholder What’s FAIR from a Cataloguer perspective maybe useless from a biologists viewpoint
  13. 50 shades of FAIR – Robert-John Schmidt
  14. FAIRsFAIR Open Consultation on FAIR Data Policies and Practices in Europe
  15. Bioschemas mark-up about licence?
  16. This group really tried this Scale up and scale out automation of indicators and their evaluation Mark volunteers to write compliance tests
  17. Cookbooks, BYODs, Tools A miracle occurs with very clever people Running at the same time as defining FAIR
  18. “50 Shades of FAIR” Identifier use is chaotic, for both data and metadata. Separating metadata and data is problematic FAIR is a set of behaviours not a specific technology Content negotiation is NOT how you differentiate data from Metadata. It's how you negotiate serialization of the identified thing. Identifier use is chaotic, for both data and metadata, and no clear way to point from one to another. Separating what is metadata and what is data given a URI is a problem” FAIR is a set of behaviours (use of tech and people) not a specific technology
  19. Born FAIR
  20. Hence stuff like ReproNim need Community engagement: The ‘last mile’ challenge for European research e-infrastructures Dimitrios Koureas, ed
  21. HIDDEN SLIDE
  22. From the opening talk
  23. HIDDEN SLIDE
  24. Villians mentioned: PIs and senior faculty Heroes: PhD students
  25. Computational and SOPs (here its Computational)
  26. FAIR Software should facilitate making FAIR Data.
  27. Maintainability Testing Portability Contributor policy Identity Copyright Licenses Documentation Sustainability
  28. Join in! Like Data: many FAIR Data Principles apply Repositories (F) Standardising descriptions of workflow, provenance and components (I, R): CWL, PROV Metadata about, combining and referencing between components (I, R): Research Objects
  29. HIDDEN SLIDE The EOSC Life computational workflows stack
  30. Standardize exchange of HTS workflows for regulatory submissions between FDA, pharma, bioinformatics platform providers and researchers replicate the computational analytical workflow to review and approve the bioinformatics Inspect and replicate the computational analytical workflow to review and approve the bioinformatics
  31. HIDDEN SLIDE
Advertisement