Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Reproducible Research: how could Research Objects help

140 views

Published on

Reproducible Research: how could Research Objects help, given at 21st Genomic Standards Consortium Meeting
Dates: May 20-23, 2019
https://press3.mcs.anl.gov/gensc/meetings/gsc21/

Published in: Science
  • Be the first to comment

  • Be the first to like this

Reproducible Research: how could Research Objects help

  1. 1. Reproducible Research: how could Research Objects help Professor Carole Goble The University of Manchester, UK ELIXIR Interoperability & Head of Node UK Software Sustainability Institute UK carole.goble@manchester.ac.uk 21st Genomics Standards Consortium meeting 21 May 2019,Vienna
  2. 2. Being FAIR
  3. 3. Flipping through Nature in April 2019…. Flawed Design & Practice Poor Reporting & Availability
  4. 4. Scientific publications • announce a result • convince readers to trust it • enable a peer to reuse it or compare against it. Wet Lab Experimental science • describe the results • provide a clear enough description of the materials and protocol to allow successful repetition and extension [Jill Mesirov 2010] Dry Lab Computational science • describe the results • provide the complete software development environment, data, instructions, techniques [David Donoho 1995] Why?
  5. 5. Not reporting the design sufficiently Not enough metadata on the data or methods to understand, repeat, compare, rerun Reporting & Availability irreproducibility? The method isn’t transparently, comprehensively and accurately reported Not being able to access the data, rerun the method in your environment, have all the components you need portability preservation packaging hosting robustness descriptionids steps, provenance access dependencies
  6. 6. Flipping through Nature in April 2019…. Reproduce and reuse computations Transparently communicate the way computations are performed Disambiguate interpretation of inputs/parameters/results Safely (re)run computations ported onto different platforms Human and computer readable definitions for the provenance of computation, types for the data and results
  7. 7. The Data and the Methods Method Reproducibility the provision of enough detail about study procedures and data so the same procedures could, in theory or in actuality, be exactly repeated. Result Reproducibility the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible Procedure = Software, SOP, Lab Protocol, Workflow, Script. Tools, Technologies, Techniques. A whole bunch of them together. Goodman, et al ScienceTranslational Medicine 8 (341) 2016
  8. 8. Flipping through Nature in April 2019…. DATA UMGS genomes • in ENA ERP108418 Other datasets: • ftp://ftp.ebi.ac.uk/pub/databases/me tagenomics/umgs_analyses/ Supplementary Tables • Excel spreadsheets at the publishers
  9. 9. Flipping through Nature in April 2019…. METHODS Pointers to scripts, tools and toolkits • https://pypi.org/project/mg-toolkit/ • sR v3.4.1; Python v2.7.5 and v3.6.5; SPAdes v3.10.0; MetaBAT v2.12.1; BWA v0.7.16; samtools v1.5; CheckM v1.0.7; Mash v2.0; MUMmer v3.23; specI v1.0; MUSCLE v3.8.31; DIAMOND v0.9.17.118; prodigal v2.6.3; InterProScan v5.27-66.0; antiSMASH 4; ALDEx2; sourmashv2.0.0a4; phytools v0.6-44; GhostKOALA; VirFinder v1.1; CompareM v0.0.23; MEGAHIT v1.1.3; MetaWRAP v1.0; MaxBin v2.2.4; mltoolsv0.3.5; RAxML v8.1.15; CD-HIT v4.7; tRNAscan-SE v2.0; INFERNAL v1.1.2; dRep v2.2 • Parameter settings?Configurations?
  10. 10. De-contextualised Static, Fragmented Lost Semantic linking Contextualised Active, Unified Semantic linking Buried in a PDF figure Dissemination Fragmentation
  11. 11. Community specific approaches … Scharm M,Wendland F, Peters M,Wolfien M,TheileT,Waltemath D SEMS, University of Rostock zip-like file with a manifest & metadata - Bundle files - Keep provenance - Exchange data - Ship results Bergmann, F.T. (2014). COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project. BMC bioinformatics,15(1), 1. Combine Archive Systems Biology Systems Medicine https://sems.unirostock.de/projects/combinearchive/
  12. 12. Research Objects Bundled together** Digital objects* • PIDs • Metadata *Turning FAIR into reality Final report and action plan from the European Commission expert group on FAIR data , Nov 2018 ** Bechhofer et al (2013) Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
  13. 13. Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004 Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/ machine processable metadata in common and specific to different object types. bundle together and relate digital resources with their context into a unit. snapshot | cite | exchange Research Object Framework
  14. 14. Container “Unbounded” Objects: External references to things A Digital Package ObjectType composed of many interrelated elements that bundles together and relates digital resources of a scientific investigation with context. A Metadata Object represents properties in common across all research artefacts types, common PIDs and metadata Bigger on the inside than the outside
  15. 15. Archive formats to encode the object Container Profile Manifest Construction Profile Model & format for constructing the manifest Standards! OADM OAI Manifest Content Profile About the Object What is in the Object Tailored to the Object type Validate - what expect to be there Domain Ontologies PROV GitHub
  16. 16. Workflow EBI’s MGnify metagenomics pipelines Workflow description Input data files Command line tools, containerised tools, workflows Output data files
  17. 17. Why Who HowWhat When Where SOP, Lab Protocol Publication Workflow Content Profile Workflow Workflow Run Workflow “Node”
  18. 18. Describes computational workflows to be portable, scalable & interoperable with different workflow systems and containerised tools Bundles the CWL workflow descriptions Adds context, provenance, examples, validation data … Snapshots workflow. Relates it to other objects - studies, data collections, SOPs and Lab protocols … https://www.commonwl.org/
  19. 19. Description of tools, inputs and outputs. Ontology markup using EDAM CWL files in GitHub Or export from native platforms Bundles it all together Example input files Validation tests Links to research study Software components are containerised to make them portable and handle software dependencies
  20. 20. Manifest Annotations about the content of the manifest SHACL Create Validate Curate Explore https://view.commonwl.org/workflows/github.com/mnneveau/cancer-genomics- workflow/blob/master/detect_variants/detect_variants.cwl For the JSON fans…
  21. 21. For example: CWL Provenance Data lineage and licence/citation tracking
  22. 22. EDAM Ontology CWL enabledWfMS Which machines ?? ? parameters configurations ?
  23. 23. Flipping through Nature in April 2019…. METHODS
  24. 24. Inspect and replicate the computational analytical workflow to review and approve the bioinformatics Standardize exchange of HTS workflows for regulatory submissions between FDA, pharma, bioinformatics platform providers and researchers “Parametric domain” IEEE P2791 BioCompute Working Group http://biocomputeobject.org
  25. 25. Sharing commons co-development publishing Exchange rich description portability, interoperability reproducibility recomputation Active Releasing changes updates Stewardship preservation maintenance Challenges Objectness • nesting, citing, lifecycles, governance… Content Profiles • machine processable accuracy and detail Tooling • Embedded into platforms, on ramps
  26. 26. NIH Data Commons Big data distributed over multiple locations, Efficiently and safely moved on demand ROs are verified collections of references [Chard, et al 2016]
  27. 27. European Open Science Cloud Commons Tools and Workflow Collaboratories RO-based Workflow Commons
  28. 28. Getting into Practice
  29. 29. Getting into Practice Commonwl.org
  30. 30. Acknowledgements Stian Soiland-Reyes Michael Crusoe Rob Finn Kyle Chard Daniel Garijo Barend Mons Sean Bechhofer Matthew Gamble Raul Palma Jun Zhao Mark Robinson AlanWilliams Norman Morrison Tim Clark Alejandra Gonzalez-Beltran Philippe Rocca-Serra Ian Cottam Susanna Sansone Kristian Garza Catarina Martins Iain Buchan Carl Kesselman Ian Foster Vahan Simonyan Ravi Madduri Raja Mazumder GilAlterovitz, Denis Dean II Durga Addepalli Wouter Haak Anita De Waard Paul Groth Oscar Corcho CWL and RO communities Project ID: 675728

×