Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

FAIRy Stories

655 views

Published on

Findable Accessable Interoperable Reusable < data |models | SOPs | samples | articles| * >. FAIR is a mantra; a meme; a myth; a mystery; a moan. For the past 15 years I have been working on FAIR in a bunch of projects and initiatives in Life Science projects. Some are top-down like Life Science European Research Infrastructures ELIXIR and ISBE, and some are bottom-up, supporting research projects in Systems and Synthetic Biology (FAIRDOM), Biodiversity (BioVel), and Pharmacology (open PHACTS), for example. Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. Some have happy endings. Who are the villains and who are the heroes? What are the morals we can draw from these stories?

Published in: Science
  • Be the first to comment

  • Be the first to like this

FAIRy Stories

  1. 1. FAIRy stories for Christmas Carole Goble The University of Manchester, UK carole.goble@manchester.ac.uk ELIXIR-UK, FAIRDOM, ISBE, BioExcel CoE, Software Sustainability Institute Open PHACTS SWAT4HCLS 2017, 5th Dec 2017, Rome
  2. 2. Once upon a time in a land far, far away lived a KinG … Who wanted all data to be FAIR….
  3. 3. Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E. Bourne, Jildau Bouwman, Anthony J. Brookes, Tim Clark, Mercè Crosas, Ingrid Dillo, Olivier Dumon, Scott Edmunds, Chris T. Evelo, Richard Finkers, Alejandra Gonzalez-Beltran, Alasdair J.G. Gray, Paul Groth, Carole Goble, Jeffrey S. Grethe, Jaap Heringa, Peter A.C ’t Hoen, Rob Hooft, Tobias Kuhn, Ruben Kok, Joost Kok, Scott J. Lusher, Maryann E. Martone, Albert Mons, Abel L. Packer, Bengt Persson, Philippe Rocca-Serra, Marco Roos, Rene van Schaik, Susanna-Assunta Sansone, Erik Schultes, Thierry Sengstag, Ted Slater, George Strawn, Morris A. Swertz, Mark Thompson, Johan van der Lei, Erik van Mulligen, Jan Velterop, Andra Waagmeester, Peter Wittenburg, Katherine Wolstencroft, Jun Zhao, Barend Mons Wilkinson Dumontier Schultes Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
  4. 4. Queens… And FAIRY GODMOTHERS Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
  5. 5. Machine Processable Metadata Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18 • Catalogues, Search, Stores • Metadata Standards • StandardAccess protocols • Identifiers, Policies • Authorised Access • Licensing
  6. 6. FAIR spread across the lands …… VIVO/SciTS Conferences 6-8 August 2014, Austin, TX
  7. 7. FAIR spread across the lands ……
  8. 8. Stakeholder FAIR Awareness UK Institutional Research Data Management guidance* * Jisc: Final Report FAIR in Practice, Nov 2017 Government, Funder, Publisher, National & International Infrastructures… Institutional Researchers FAIR spread across the lands …… BUT not necessarily all the peoples
  9. 9. FAIR spread across the lands ……
  10. 10. Moral: Names are important Spinning (metadata) straw into gold Be careful what you promise…
  11. 11. Me Too! staking claims we { are | will be | always have been } FAIR a rallying flag
  12. 12. Hype Curve
  13. 13. http://dx.doi.org/10.1101/225490 http://blog.ukdataser vice.ac.uk/fair-data- assessment-tool/ http://fairmetrics.org/
  14. 14. Beware… beauty is in the eye of the beholder What’s FAIR from a Cataloguer perspective maybe useless from a biologists viewpoint
  15. 15. My Semantic FAIRy Stories The Scientist and the FAIR Commons The MAGIC Research Object little semantics and the big Web
  16. 16. The Scientists and the FAIR Research Commons Supporting mixed types and many researchers FAIR
  17. 17. The Scientists and the FAIR Research Commons Find: ID resolution Faceted Navigation Search, RDF SPARQL endpoint, APIs A Commons for Workflows myexperiment.org A Commons for Systems Biology Projects fairdomhub.org investigation study assay/analysis data models SOPs
  18. 18. Community & Project Commons Structured organisation across standards and types Federation over autonomous resources Laissez-Faire Independent Users Ecosystem of types, stores and metadata
  19. 19. Own little houses: from straw to bricks Permission controls Staged sharing Licenses Negotiated access Embargos Open
  20. 20. Schema Dublin core Datacite, DCAT, Bioschemas Catalogue Level Investigation Studies Assay/Analysis Content level Persistent Identifiers Content level subject thematic standards Content level Stratified Linked Data
  21. 21. Getting the best FAIR metadata…. FAIR Access – myExperiment -> open – FAIRDOM -> friends and family – Hand over straw houses to FAIRDOMHub “TheTragedy of the Commons”* – Metadata quality and quantity – Identifier hygiene – Curation & contributions – Public good vs personal burden – Incorporation into processes – Community socialisation - obligations mismatches. Credit! *Mark Musen , https://ncip.nci.nih.gov/blog/face-new-tragedy-commons-remedy-better-metadata/
  22. 22. project PIs, funders time burden, distrust project PIs, funders PALs – juniors, advocates and Cinderellas templates, tools benefit
  23. 23. Moral: Incentives
  24. 24. Bake in “Semantic Nudging” Ontologies stealthily embedded in Excel spreadsheet templates Added value - Model execution Vanity, guilt, shaming Automation rightfield.org.uk
  25. 25. Cinderella? The Spreadsheet
  26. 26. “The Last Mile”* -> The First Mile FAIR from bench to cloud Last mile - Infrastructure view First mile - researcher / resource view * Dimitrios Koureas et al Community engagement: The ‘last mile’ challenge for European research e-infrastructures Research I deas and Outcomes 2: e9933 (20 Jul 2016) https://doi.org/10.3897/rio.2.e9933
  27. 27. the generic vs specific zig zag path
  28. 28. The MAGIC Research OBJECT GENERIC Framework For exchange, reproducibility, Preservation, active artefacts Universal Catering, bottomless content FAIR
  29. 29. The FAIR Research Object import, exchange, portability, maintenance ISA-TAB Bergman et al COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project, BMC Bioinformatics 2014, 15:369
  30. 30. workflow engine Workflow Run Provenance Inputs Outputs Intermediates Parameters Configs Narrative Exchange between people & platforms Commons store, catalogue & archive Reproduce preserve, port, repair Activate re-compute, mix, compare, evolve The FAIR Workflow Research Object
  31. 31. researchobject.org Bechhofer et al (2013) Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004 Bechhofer et al (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/ Standards-based generic metadata framework for bundling internal and external resources with context citable reproducible packaging Data used and results produced in study Methods employed to produce/analyse data Provenance and settings for the experiments People involved in the investigation Annotations about these resources:- understanding & interpretation
  32. 32. Linking across ROs and into the Linked Open Data Cloud • Recording & linking together the components of an experiment • Linking across experiments. • Linked ROs • A SemanticWeb of Research Objects • Resource References – a bottomless pot
  33. 33. Technology Independent. The least possible. The simplest feasible. Low tech. Low user overhead and thin client Graceful degradation. FAIR ROs Desiderata
  34. 34. Construction Content Profile Types Identification to locate things Aggregates to link things together Annotations about things & their relationships Type Checklists what should be there Provenance where it came from Versioning its evolution Dependencies what else is needed Manifest checklist Type Checklists describing what should be there Container Metadata Objects
  35. 35. Construction http://www.researchobject.org/specifications/ RO Model Identifiers: URI, RRI, DOI, ORCID W3C Web AnnotationVocabulary Open Archives Initiative Object Exchange and Reuse Aggregation Annotation Container
  36. 36. Content Profiles. Progression LevelsContainer
  37. 37. Profile http://purl.org/minim/description W3C Shape Specs *Gamble, Zhao, Klyne, Goble. "MIM: A Minimum Information Model Vocabulary and Framework for Scientific Linked Data", IEEE eScience 2012 Chicago, USA October, 2012), http://dx.doi.org/10.1109/eScience.2012.6404489 validators / viewers Minim model for defining checklists* multiple profiles for different consumers Generic Specifics RO-SHOW Container
  38. 38. Linked Data Pharmacological Discovery Platform Data Releases Dataset “build” RO Library Earth Sciences Public Health Learning Systems Asthma Research e- Lab sharing and computing statistical cohort studies Happy Endings! ISA based Packaging, Systems Biology commons & publishing Managing distributed unmovable large datasets for Biomedical HTS analytic pipelines * * Chard et al I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets, https://doi.org/10.1109/BigData.2016.7840618
  39. 39. Happy Ending – Workflows Biomedical HTS analytic pipelines Manifest description of CWL workflows + rich context + provenance + other objects + snapshots Precision medicine NGS pipelines regulation* *Alterovitz, Dean II, Goble, Crusoe, Soiland-Reyes et al Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results, biorxiv.org, 2017, https://doi.org/10.1101/191783 EDAM Biomolecular modelling PortableWorkflows
  40. 40. BagIT, JSON(-LD), schema.org https://dokie.li/ https://linkedresearch.org/ Manifest: Schema.org, JSON-LD, RDF Archive: .tar.gz Reproducible Document Stack project eLife, Substance and Stencila BagIT data profile + schema.org JSON-LD annotations Many Roads
  41. 41. Morals Incremental, open frameworks hard work – Extensive reuse of standards is tricky – Too Generic vsToo Specific – Multi-element type & nesting challenges – ROs with a Purpose – Examples & templates Representational Beauty vsTools – Easy to make, hard to consume – Be specific, be developer friendly – Profiles & tools critical Patience is a virtue
  42. 42. Bioschemas: Little Semantics and the big web Being and keeping light, small and viral FAIR
  43. 43. Structured data markup for web pages Schema.org adds simple structured metadata markup to web pages & sitemaps for harvesting, search and summary snippet making. Search engines often highlight websites containing Schema.org Widespread commercial and open source infrastructure creates a low barrier to adoption
  44. 44. Goldilocks & the 3 Use Cases Standardised metadata mark-up Metadata published & harvested withoutAPIs or special feeds 3 Use Cases 1. Finding/Citing, 2. Summary snippets 3. Metadata exchange / ingest Goldilocks • Reuse ubiquitous commercial platform • The least possible change, the max possible reuse • Minimum properties – 6 • Reuse domain ontologies – we are not reinventing them! Commodity Off the Shelf tools App eco-system Repository Level Content type level
  45. 45. Standardised metadata mark-up Metadata published & harvested withoutAPIs or special feeds Commodity Off the Shelf tools App eco-system Repository Level Content type level Goldilocks & the 3 Use Cases
  46. 46. Training materialsEvents Organizations Data Software Lab Protocols schema.org tailored to the Biosciences for FAIR simple structured metadata markup on web pages & sitemaps bio.tools
  47. 47. schema.org tailored to the Biosciences simple structured metadata markup on web pages & sitemaps • Specific for life sciences • Extends existing Schema.org types • Focused on few types and well defined relationships • Minimum properties for finding and accessing data • Best practices for selected properties • Managed by Bioschemas.org • Generic data model • Generous list of properties to describe data types • Managed by Schema.org
  48. 48. Tailored schema.org to improve Findability and Accessibility in Bioscience Layer of constraints + documentation + extensions Leyla Garcia. Poster & Flashtalk
  49. 49. 2-3 Oct 2017, Hinxton, ~50 people Ideally 6 concepts Reuse ontologies schema.org Real mark-up Tools Find, Cite, Snippets, Metadata exchange Community
  50. 50. http://www.france-bioinformatique.fr/en/training_material https://search.google.com/structured-data/testing-tool Applied Drupal 7 schema.org extension Took about 2 hours Included inTeSS in an hour [Niall Beard]
  51. 51. MORALs Community Buy-in Worth it • First specs & main mechanism for training • Google / Schema & ELIXIR support • Research Schemas for EuropeanOpen Science Cloud pilot Goldilocks works but is hard work • Types & Profiles debates • Elegance vs best for tools • Reuse domain ontologies • Validation, mark-up & harvesting tools Trolls
  52. 52. How are we FAIRing? Different levels with different emphasis Its an Ecosystem, not a single solution • Catalogues, Search, Stores • Metadata Standards • StandardAccess protocols • Identifiers, Policies • AuthorisedAccess • Licensing
  53. 53. smart rebrand launch Still hard, same stuff Rally big communities and grassroots initiatives Examine our capabilities There is no magic
  54. 54. FAIRy Land PEST Political Economic Social Technical
  55. 55. Platform & user buy-in from the get-go Passionate, dedicated leadership Seeding critical mass Community Tools Driver Bottom up initiatives fostered by big umbrellas infrastructures FAIR Semantic Village* Simple & Lightweight Ramps not revolutions FAIR with a PURPOSE & With PEOPLE FAIR Support typical developer – Familiarity – JSON, APIs *Deb McGuinness
  56. 56. Research for FAIR FAIR representation • The Semantic Web Automated metadata • Deep learning, machine learning, AI • Text Mining, Ontology mapping Social metadata • User Experience, Crowd Sourcing • Choice architecture FAIR action • Blockchain • Virtualised & remote execution • Image processing • Preservation & portability • Provenance tracking, object trajectories • Engineering & Design, Ethics, Social Sciences Research + Developer Practitioner practices
  57. 57. Mark Robinson Norman Morrison Paul Groth Tim Clark Alejandra Gonzalez-Beltran Philippe Rocca-Serra Ian Cottam Susanna Sansone Kristian Garza Daniel Garijo Catarina Martins Iain Buchan Caroline Jay David De Roure Oscar Corcho Steve Pettifer Khalid Belhajjame Jun Zhao Phil Crouch Lilian Gorea, Oluwatomide Fasugba Stian Soiland-Reyes Michael Crusoe Rafael Jimenez Alasdair Gray Barend Mons Sean Bechhofer Michel Dumontier Mark Wilkinson Leyla Garcia Stuart Owen KatyWolstencroft Finn Bacall Alan Williams Wolfgang Mueller Olga Krebs Jacky Snoep Matthew Gamble Raul Palma Mark Musen http://www.researchobject.org http://www.myexperiment.org http://wf4ever.org http://www.fair-dom.org http://www.fairdomhub.org http://seek4science.org http://rightfield.org.uk http://www.bioschemas.org http://www.commonwl.org http://www.bioexcel.eu http://www.openphacts.org

×