Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mtsr2015 goble-keynote

1,016 views

Published on

Metadata and Semantics Research Conference, Manchester, UK 2015

Research Objects: why, what and how,
In practice the exchange, reuse and reproduction of scientific experiments is hard, dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: codes fork, data is updated, algorithms are revised, workflows break, service updates are released. Neither should they be viewed just as second-class artifacts tethered to publications, but the focus of research outcomes in their own right: articles clustered around datasets, methods with citation profiles. Many funders and publishers have come to acknowledge this, moving to data sharing policies and provisioning e-infrastructure platforms. Many researchers recognise the importance of working with Research Objects. The term has become widespread. However. What is a Research Object?  How do you mint one, exchange one, build a platform to support one, curate one? How do we introduce them in a lightweight way that platform developers can migrate to? What is the practical impact of a Research Object Commons on training, stewardship, scholarship, sharing? How do we address the scholarly and technological debt of making and maintaining Research Objects? Are there any examples

I’ll present our practical experiences of the why, what and how of Research Objects.

Published in: Science

Mtsr2015 goble-keynote

  1. 1. Research Objects: why, what and how ProfessorCarole Goble CBE FREng FBCS The University of Manchester, UK The Software Sustainability Institute, UK carole.goble@manchester.ac.uk researchobject.org Metadata and Semantic Research Conference 2015, 9-11 Sept 2015, Manchester, UK
  2. 2. Prologue e-Lab Collabs. & Shared Asset Repositories Knowledge, Metadata, Linked Data, Ontologies Software Engineering for Scientists Computational Workflow Systems Reproducibility Micro Publications Open Science Research Objects Linked Data for Science Scholarly Comms
  3. 3. Prologue Biodiversity Systems Biology Synthetic Biology Astronomy Helio Physics Genomics Public Health Epidemiology Digital Preservation Social Science Pharmacology
  4. 4. Knowledge Turning, Info Flow Barriers to Cure • Access to scientific resources • Coordination and Collaboration • Flow of Information http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation
  5. 5. [Pettifer, Attwood] http://getutopia.com
  6. 6. Virtual Witnessing* Scientific publications: • announce a result • convince readers the result is correct “papers in experimental [and computational science] should describe the results and provide a clear enough protocol [algorithm] to allow successful repetition and extension” Jill Mesirov, Broad Institute, 2010** **Accessible Reproducible Research, Science 22 January 2010, Vol. 327 no. 5964 pp. 415-416, DOI: 10.1126/science.1179653 *Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer.
  7. 7. Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF COLITIS Inflammatory Bowel Diseases, , 2015 “Only one of the 58 papers reported all essential criteria on our checklist. Animal age, gender, housing conditions and mortality/morbidity were all poorly reported…..” 50papers randomly chosen from 378 manuscripts in 2011 that use BurrowsWheeler Aligner for mapping Illumina reads 31 no s/w version, parameters, exact version of genomic reference sequence 26no access to primary data sets Nekrutenko & Taylor, Next-generation sequencing data interpretation: enhancing, reproducibility and accessibility, Nature Genetics 13 (2012)
  8. 8. “I can’t immediately reproduce the research in my own laboratory. It took an estimated 280 hours for an average user to approximately reproduce the paper.” Prof Phil Bourne Associate Director, NIH Big Data 2 Knowledge Program
  9. 9. “An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995
  10. 10. From Manuscripts to “Research Objects” Multi-various, citable research products/assets
  11. 11. From manuscripts to “Research Objects”
  12. 12. From manuscripts to “Research Objects” Pre-packaged Docker images containing a bioinformatics tool and standardised interface through which data and parameters are passed. http://bioboxes.org
  13. 13. FAIR Research, crossing silos From Manuscripts to “Research Objects” Datasets, Data collections Standard operating procedures Software, algorithms Configurations, Tools and apps, services Codes, code libraries Workflows, scripts System software Infrastructure Compilers, hardware Fragmentation
  14. 14. FAIR RO Distributed Commons NIH BD2K, EU FAIRPorts…. Pooled Resources
  15. 15. NIH BD2K Commons and Research Objects https://datascience.nih.gov/commons
  16. 16. Why Research Objects? • Computational Workflows / Scripts – Multi-step, nested. – Data, executable codes (remote and local), libraries – Preservation, Repair – Reproducibility • Systems Biology – Models, data (construction, validation, predicted), SOPs, samples, articles – Structured Investigations, Studies, Assays – Exchange – Reproducibility
  17. 17. Why Research Objects? • Computational Workflows / Scripts – Multi-step, nested. – Data, executable codes (remote and local), libraries – Preservation, Repair – Reproducibility • Systems Biology – Models, data (construction, validation, predicted), SOPs, samples, articles – Structured Investigations, Studies, Assays – Exchange – Reproducibility Commons Commons myexperiment.org fair-dom.org
  18. 18. "Mapping present and future predicted distribution patterns for a meso-grazer guild in the Baltic Sea" by Sonja Leidenberger et al Workflow Commons
  19. 19. Instruments, Materials, Method Data Scopes Input Data Software Output Data Config Parameters Methods techniques, algorithms, spec. of the steps Materials datasets, parameters, algorithm seeds Experiment Instruments codes, services, scripts, underlying libraries Laboratory sw and hw infrastructure, systems software, integrative platforms Setup Drummond, Replicability is not Reproducibility: Nor is it Good Science, online Peng, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.
  20. 20. Instruments, Materials, Method Read. Run. Remake Science changes, experiments & results vary, So do labs. Instruments break, labs decay. Zhao, et al . Why workflows break - Understanding and combating decay in Taverna workflows, 8th Intl Conf e-Science 2012 http://atyourservice.blogs.xerox.com/files/2011/09/cloning-results-may-vary.jpg
  21. 21. Reproducibility: working. reporting submit article and move on… publish article Research Environment Publication Environment Peer Review
  22. 22. FAIR Reproducibility Find, Access, Interoperate, Reuse
  23. 23. https://doi.org/10.15490/seek.1.investigation.56
  24. 24. FAIRDOM Metadata framework link studies, link assets, map content to. Common elements and relationships between things produced and used in experiments. Common elements Specific elements for specific data types. Just Enough Results Model http://seek4science.org/JERMOntologyhttp://isatab.sourceforge.net/format.html
  25. 25. Penkler et al (2015) FEBSJ 282:1481-1511 https://dx.doi.org/10.1111/febs.13237
  26. 26. Consumers Producers Project Repositories harvesting link Standards organise validate Native Commons Repositories Why Research Objects? Compound, nested, scattered, yet interconnected COMMONS
  27. 27. Why Research Objects? Preserved, portable research products. Snapshots. inter-platform exchange, reproducibility Commons New Discovery
  28. 28. Cross-Institutional e-Lab fragmentation parts scattered across subject specific/general resources 101 Innovations in Scholarly Communication - the Changing ResearchWorkflow, Boseman and Kramer, 2015, http://figshare.com/articles/101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow/1286826
  29. 29. Why Research Objects? Active research products, snaphots • Fork. • Merge. • Version. • Cite • Snapshot. • Live. [Martin Scharm] Haus et al, BMC Systems Biology, 2011, 5:10 Solvent production by Clostridium acetobutylicum
  30. 30. F1000Research Living Figures versioned articles, in-article data manipulation R Lawrence Force2015, Vision Award Runner Up http://f1000.com/posters/browse/summary/1097482 Simply data + code Can change the definition of a figure, and ultimately the journal article Colomb J and Brembs B. Sub-strains of Drosophila Canton-S differ markedly in their locomotor behavior [v1; ref status: indexed, http://f1000r.es/3is] F1000Research 2014, 3:176 Other labs can replicate the study, or contribute their data to a meta- analysis or disease model - figure automatically updates. Data updates time-stamped. New conclusions added via versions.
  31. 31. Publish, Release (like Software) 11/09/2015 34 An “evolving manuscript” would begin with a pre-publication, pre-peer review “beta 0.9” version of an article, followed by the approved published article itself, [ … ] “version 1.0”. Subsequently, scientists would update this paper with details of further work as the area of research develops. Versions 2.0 and 3.0 might allow for the “accretion of confirmation [and] reputation”. Ottoline Leyser […] assessment criteria in science revolve around the individual. “People have stopped thinking about the scientific enterprise”. http://www.timeshighereducation.co.uk/news/evolving-manuscripts-the-future-of-scientific-communication/2020200.article
  32. 32. Jennifer Schopf,Treating Data Like Software: A Case for Production Quality Data,JCDL 2012 Software-like Release paradigm • Agile development methods • Free Open Source Software methods https://tctechcrunch2011.files.wordpress.com/2011/05/tcdisrupt_tc-9.jpg
  33. 33. Knowledge Turning interpret Commons FAIR Research Products Reproducibility Interpretation Comparison Preservation Portability Release Active Research Research Objectmeans ends drivers Why Summary Framework Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, DOI: 10.1007/978-3-642-37186-8_1
  34. 34. Multi-various products, platforms, resources. First class citizens - id, manage, credit, track, profile, focus A Framework to Bundle, Port and Link (scattered) resources, related experiments. Metadata Objects that carry Research Context. Units of exchange. Bechhofer,Why linked data is not enough for scientists, DOI: 10.1016/j.future.2011.08.004
  35. 35. Metadata Objects Evolving multi –typed, stewarded, sited, authored span research, researchers, platforms, time Contributions. Content. closed <-> open local <-> alienembed <-> refer Stewardship. Citation. Bigger on the inside than the outside, Content maybe logically or physically inside TARDIS:Time and Relative Dimension in Space Scholarship https://meditationsfromzion.files.wordpress.com/2013/05/tardis.jpg
  36. 36. What and How Framework Manifest Core model using standards Annotation profiles progressive extensions Implement- ation Profiles using legacy & commodity platforms Policies Tools Lifecycle Steward Ship Training Principles & Conventions API specificationMetadata formats
  37. 37. Technology Independent. The least possible. The simplest feasible. Low tech. Graceful degradation. The Research Object Desiderata
  38. 38. Manifests and Containers Container Packaging: Zip files, Docker images, BagIt, … Catalogues & Commons Platforms: FAIRDOM SEEK, Farr CommonsCKAN, STELAR eLab, myExperiment Manifest Metadata Describes the aggregated resources, their annotations and their provenance Manifest
  39. 39. Manifest Metadata Manifest Construction • Identification – id, title, creator, status…. • Aggregates – list of ids/links to resources • Annotations – list of annotations about resources Manifest Manifest Description • Checklists – what should be there • Provenance – where it came from • Versioning – its evolution • Dependencies – what else is needed Manifest
  40. 40. Manifest Construction Unique identifiers as names for things. doi, epic, orcid, purl, RII, Identifiers.org Mechanism of aggregation to group things together. OAI-ORE Metadata about those things & how they relate to each other. W3C OADM http://w3id.org/ro/
  41. 41. FAIR Manifest Descriptions: Types of RO Progressive Annotation Profiles Checklist Provenance Versioning Dependencies http://www.cnri.reston.va.us/papers/OverviewDi gitalObjectArchitecture.pdf NISO-JATS Dublin Core EFO JERM SBML wfdesc
  42. 42. Checklists aka Reporting Guidelines Consistent Reporting, Standardised Cataloguing, Validation Gamble, Goble, Klyne, Zhao MIM:A Minimum Information Model vocabulary and framework for Scientific Linked Data, IEEE 8th Intl Conf on eScience , 2012 MeanWhealDiameter reports: must include values for the properties: SubjectId, SptSolution, Date, FollowUp should include values for the properties:VariableLabel
  43. 43. Implementation Profiles Research Object Bundle Specification Manifest https://w3id.org/bundle/ doi:10.5281/zenodo.10440 Container Packaging: Zip files, Docker images, BagIt, … Catalogues & Commons Platforms: FAIRDOM SEEK, Farr CommonsCKAN, STELAR eLab, myExperiment
  44. 44. RO Unzip • Reproducibility • Versioning • Systematic and extensible meta- data collection • Cross platform exchange • Publishing Living Snapshot Sys and Syn Bio Experiments management and publishing
  45. 45. Examples
  46. 46. Sys & Syn Biology Community Standards Bergmann, Rodriguez, Le Novère. COMBINE archive specification. <http://identifiers.org/combine.specifications/o mex.version-1> (2014) Bergman et al COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project, BMC Bioinformatics 2014, 15:369 Combine with RO. Standardised metadata & API http://co.mbine.org/documents/archive https://github.com/stain/ro-combine-archive doi:10.5281/zenodo.10439 Martin Scharm Universität Rostock
  47. 47. ATLAS Collider Data Analytics Portable, lightweight application runtime and packaging tool. Image ATLAS and CMS detector data CharlesVardeman, Da Huo University of Notre Dame All data and files of the execution + Instructions convert bundle manifest Relate files and layers Add provenance and annotations Link in other content Exchange Reproducibility Same data Same code Same run time environment Systematic and extensible metadata collection
  48. 48. Computational Workflow Runs workflowrun.prov.ttl (RDF) outputA.txt outputC.jpg outputB/ intermediates/ 1.txt 2.txt 3.txt de/def2e58b-50e2-4949-9980-fd310166621a.txt inputA.txt workflow attribution execution environment Aggregating in Research Object ZIP folder structure (RO Bundle) mimetype application/vnd.wf4ever.robundle+zip .ro/manifest.jso n URI references Exchange Reproducibility Same data Same code Systematic and extensible meta- data collection Workflow Annotation Profile Wf4Ever Project
  49. 49. STELAR Asthma Research e-Lab STELAR e-Lab Requests for data Data Exports Comments, questions ALSPAC MAAS SEATON Ashford On-going data collection STELAR Researchers Isle of Wight Data Collection Methods and Results STELARTeam Farr Institute@Manchester
  50. 50. Farr Institute Commons catalogues over safe havens Exchange Systematic and extensible meta-data collection
  51. 51. NIH BD2K Commons and Research Objects Metadata Profiles RO Model API Community IDs RO Model Manifest Profile Implementation Profiles https://datascience.nih.gov/commons
  52. 52. Many Challenges
  53. 53. Many outstanding issues… Social & Cultural Technical Tragedy of the Commons https://doctorwhothing.files.wordpress.com/2014/01/doctor-who- fan-girl-group.jpg
  54. 54. me ME my team close colleagues peers Personal productivity Retention & Reuse Publish driven Public Good Sharing & Reproducibility Access driven [Apologies to Resnick and Malone]
  55. 55. FAIR Reward. Reducing Pain. Cost vs Benefit.
  56. 56. RO Ramps. Born RO. Commodity Tooling, Libraries, Lightweight Making and Auto-making Manifest Descriptions Making Containers Literate Programming, electronic lab notebooks Rendering & Using Manifests
  57. 57. FAIR Citation, credit, tracking • Citation – Resolution and semantics • Tamper-proof currency – Blockchain, Ethereum • RO trajectories – Data trajectories [Missier] – Provenance propagation • Credit trajectories – Micro-credit tracking • Social-political acceptance – All research products valued – FAIR publishing effort recognised • Defend it (snapshot) • Locate it (most recent) • Reuse it (a version, a component) • Credit it (contributory authorship) • Cross link it (connections)
  58. 58. Knowledge Turning with Ros Simple approach, towards transparent FAIR principles https://d2t1xqejof9utc.cloudfront.net/screenshots/pics/1ddf584eb4cf6b12 83baf9aa6d380cff/original.jpg
  59. 59. Inspired by Bob Harrison • Incremental shift for infrastructure providers. • Moderate shift for policy makers and stewards. • Paradigm shift for researchers, their institutions and publishers. Knowledge Turning with ROs
  60. 60. All the members of the Wf4Ever team Colleagues in Manchester’s Information Management Group http://www.researchobject.org http://www.wf4ever-project.org http://www.fair-dom.org http://seek4science.org http://rightfield.org.uk http://www.software.ac.uk http://www.datafairport.org Alan Williams Jo McEntyre Norman Morrison Stian Soiland-Reyes Paul Groth Tim Clark Juliana Freire Alejandra Gonzalez-Beltran Philippe Rocca-Serra Ian Cottam Susanna Sansone Kristian Garza Barend Mons Sean Bechhofer Philip Bourne Matthew Gamble Raul Palma Jun Zhao Neil Chue Hong Josh Sommer Matthias Obst Jacky Snoep David Gavaghan Rebecca Lawrence Stuart Owen Finn Bacall

×