Research Objects for FAIRer Science

1,159
-1

Published on

Research Objects for FAIRer Science - Shared Keynote presentation at VIVO and Science of Team Science Joint Conference, 6-8 August 2014, Austin Texas

Published in: Science
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,159
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
24
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Research Objects for FAIRer Science

  1. 1. Research Objects for FAIRer Science Professor Carole Goble CBE FREng FBCS The University of Manchester, UK carole.goble@manchester.ac.uk VIVO/SciTS Conferences 6-8 August 2014, Austin,TX
  2. 2. Scientific publications have at least two goals: (i) to announce a result and (ii) to convince readers that the result is correct ….. papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension Jill Mesirov Accessible Reproducible Research Science 22Jan 2010: 327(5964): 415-416 DOI: 10.1126/science.1179653 VirtualWitnessing* *Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer.
  3. 3. VirtualWitnessing* *Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer. Capturing, representing, sharing the information needed to understand how a research result came about. Context of results • Inputs, outputs, process… Context of resources • Instruments, data, software, people…
  4. 4. “An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995 datasets data collections standard operating procedures software algorithms configurations tools and apps codes workflows scripts code libraries services, system software infrastructure, compilers hardware Morin et al Shining Light into Black Boxes Science 13 April 2012: 336(6078) 159-160 Ince et alThe case for open computer programs Nature 482, 2012
  5. 5. “I can’t immediately reproduce the research in my own laboratory. It took an estimated 280 hours for an average user to approximately reproduce the paper.” Phil Bourne NIH BigWig for Data Science
  6. 6. a reproducibility paradox big, fast, complicated, multi-step, multi-type multi-field greater expectations of reproducibility diy publishing greater access
  7. 7. Systems Biology Collaborations Modelling Cycle 45 organisations 112 organisations
  8. 8. Data Models Articles External Databases http://www.seek4science.org Metadata http://www.isatools.org Ontology-driven Aggregated Content Infrastructure (Framework) for building Sys Bio Commons share and interlinking multi-stewarded, mixed, methods, models, data, samples… Standards DCAT FOAF Yellow Pages
  9. 9. Yellow Pages Careful Sharing Options
  10. 10. Commons
  11. 11. Investigations Assays Studies Towards Interoperable Bioscience Data, Nature Genetics, 2012 Standards, Structure, Interlink Just Enough Results Model for things produced and used in experiments
  12. 12. Construction data Validation data Metabolomics Mass Spec Transcriptomics Proteomics Fluxomics Publications Mix of locally & remotely hosted content Open Modelling Exchange Format Archive Wolstencroft et al, Proc ISWC 2013 Just Enough Results Model for stuff in experiments Common elements Data type specific elements
  13. 13. Experimentalists, modellers & developers Cross-site, cross project collaboration Knowledge network Building the System: Building a Cult TRUST VISION SETTING EXPECTATIONS Drink together Work together
  14. 14. • Collaboration – Complementarity correlation • Modellers share more than Experimentalists • Experimentalists reuse models more than Modellers • Active enclave sharing • Public sharing tricky even after publication, bribery and threats • Data Hugging, Flirting and Voyerism
  15. 15. • Playground rules apply • Fluid, transient collaborations > membership mgt pain in a*se • Shameless exploitation of PI competitiveness & vanity • PI & Funder leadership • Pan project spawned collaborations –YES!!!! • But not necessarily visible to us.
  16. 16. Data discovery Data assembly, cleaning, and refinement Ecological Niche Modeling Statistical analysis Data collection Insights Scholarly Communication & Reporting Enclosed sea problem (Ready et al., 2010) Pilumnus hirtellus Scientific Workflows
  17. 17. BioSTIF method instruments and laboratory materials Data discovery Data assembly, cleaning, and refinement Ecological Niche Modeling Statistical analysis Data collection Insights Scholarly Communication & Reporting Method Matters!
  18. 18. Workflow Commons
  19. 19. "Mapping present and future predicted distribution patterns for a meso-grazer guild in the Baltic Sea" by Sonja Leidenberger et al
  20. 20. 1st International Workshop on Social Object Networks (SocialObjects 2011), Boston, October 9th 2011. Find, Click ‘n’ Go File ‘n’ Forget SpecialistCurators
  21. 21. 24 Properties What would you ask a publication if you could? Identity and Description Uniqueness Authenticity Who are you ? Where and when were you born ? Who were your parents (creators) ? Review, Reuse, and Repurpose For which purpose were you conceived and have been used ? Inspection Visualization Annotations What do you have inside ? Representation How is your content structured ? Access Rights May I access all your parts ? Adaptability Which parts can I replace ? Evolution & Versioning Provenance What have they done to you ? Who and When ? Why did they do that ? Quality Why are you relevant to me ? Can I believe what you are saying or trust your results ? Reproducibility Do you still produce the same results ? Fitness Are you still working ? How could I repair you ? Credit and attribution How could I thank you ? How could I talk about you ?
  22. 22. From Manuscripts to “Research Objects” A meme The multi- dimensional paper Packs
  23. 23. Packs www.datafairport.org
  24. 24. What is a Research Object?
  25. 25. Howard Ratner, STM Innovations Seminar 2012 was: Chair STM Future Labs Committee, CEO EVP Nature PublishingGroup, now: Director of Development for CHORUS (Clearinghouse for the Open Research of US) http://www.youtube.com/watch?v=p-W4iLjLTrQ&list=PLC44A300051D052E5 http://www.myexperiment.org/packs/196.html
  26. 26. What The Commons* Is and Is Not  Is Not: – A database – Confined to one physical location – A new large infrastructure – Owned by any one group  Is: – A conceptual framework – Analogous to the Internet – A collaboratory – A few shared rules • All research objects have unique identifiers • All research objects have limited provenance Philip E. Bourne Ph.D. Associate Director for Data Science, National Institutes of Health http://www.slideshare.net/pebourne *The NIH BD2K Commons Framework $100million in 2015
  27. 27. Social Objects carriers of discourse
  28. 28. http://www.researchobject.org/ A Framework to Bundle and Relate multi-hosted (digital) resources of a scientific experiment or investigation using standard mechanisms & uniform access protocols. Carriers of Research Context Outputs are first class citizens to be managed, credited and tracked: data, software Research Objects
  29. 29. Links • Recording & linking together the components of an experiment • Linking across experiments.
  30. 30. Preserve Archive Reproduce* Recompute Reuse Train & Explain Exchange Remix Fix * a word that means many things…..
  31. 31. re-compute replicate rerun repeat re-examine repurpose recreate reuse restore reconstruct review regenerate revise recycle regenerate the figure redo Results may vary
  32. 32. repeat replicate DrummondC Replicability is not Reproducibility: Nor is it Good Science, online Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227. Methods (techniques, algorithms, spec. of the steps) Materials (datasets, parameters, algorithm seeds) Experiment Instruments (codes, services, scripts, underlying libraries) Laboratory (sw and hw infrastructure, systems software, integrative platforms) Setup reusereproduce Executable Research Object
  33. 33. same experiment same set up same lab same experiment same set up different lab same experiment different set up different experiment some of same Validate reusereproduce repeat replicate http://www.biomedcentral.com/biome/carole-goble-on-reproducible- research-what-it-really-means-how-to-reach-it/
  34. 34. Design Execution Result Analysis Collection Publish / Report Peer Review Peer Reuse Modelling Can I repeat & defend my method? Can I review / reproduce and compare my results / method with your results / method? Can I review / replicate and certify your method? Can I transfer your results into my research and reuse this method? * Adapted from Mesirov, J. Accessible Reproducible Research Science 327(5964), 415-416 (2010) Research Report Prediction Monitoring Cleaning
  35. 35. specialist codes libraries, platforms, tools services (cloud) hosted services commodity platforms data collections catalogues software repositories my data my process my codes integrative frameworks gateways
  36. 36. data carpentry http://software-carpentry.org/
  37. 37. Components & Dependencies • 35 kinds of annotations • 5 Main Workflows • 14 Nested Workflows • 25 Scripts • 11 Configuration files • 10 Software dependencies • 1Web Service • Dataset: 90 galaxies observed in 3 bands • Multiple platforms • Multiple systems José Enrique Ruiz (IAA-CSIC) Galaxy Luminosity Profiling
  38. 38. Executable Instrument Entropy Zhao,Gomez-Perez, Belhajjame, Klyne, Garcia-Cuesta,Garrido, Hettne, Roos, De Roure and Goble.Why workflows break - Understanding and combating decay in Taverna workflows, 8th Intl Conf e-Science 2012 Mitigate Detect, Repair Preserve Partial replication Approx. reproduction Verification Benchmarks
  39. 39. Executable Instrument Entropy Prepare to Repair Reproducibility by Inspection Read It Reproducibility by Invocation Run It Document Instrument
  40. 40. [Adapted Freire, 2013] provenance gather dependencies capture steps track & keep results portability variability tolerance preservation packaging versioning open accessible available machine actionable description intelligible machine-readable
  41. 41. [Adapted Freire, 2013] Authoring Exec. Papers Link docs to experiment Sweave Provenance Tracking, Versioning Replay, Record, Repair Workflows, makefiles ProvStore provenance gather dependencies capture steps track & keep results open accessible available machine actionable description intelligible machine-readable
  42. 42. [Adapted Freire, 2013] packaging portability variability tolerance preservation provenance gather dependencies capture steps track & keep results versioning host service Open Source/Store Sci as a Service Integrative fws Virtual Machines Recompute, limited installation, Black Box Byte execution, copies Descriptive read, White Box Archived record Read & Run, Co-location No installation Portable Package White Box, Installation Archived record
  43. 43. [Adapted Freire, 2013] host service ReproZip packaging portability variability tolerance preservation provenance gather dependencies capture steps track & keep results versioning
  44. 44. No Green Fields No One System Find Access Interop Reuse Porting across Platforms Exchange between Systems Comparing across Labs
  45. 45. Identity Description Packaging Refer to aggregations and their resource contents Interpretation: What does it mean? How can I compare with others? How is it linked together and linked to others? Describe aggregation structure and its constituent parts Container regardless of host FAIR RO Core Model manifest Uniform and first class handling of diverse types (data, software, workflows…)
  46. 46. Identity Annotation Aggregation FAIR RO Core Model DOIs URIs Handles ORCID W3C OAM OAI- ORE Open Annotation Model OAI-Object Reuse and Exchange
  47. 47. Identity Annotation Aggregation FAIR RO Core Model DOIs URIs Handles ORCID Aggregations Resource maps Proxies Annotation first class and stand-off Identity persistence and resolution Citation W3C OAM OAI- ORE
  48. 48. Identity Annotation Aggregation FAIR RO Core Platforms DOIs URIs Handles ORCID Data Citation Implementation W3C OAM OAI- ORE
  49. 49. Distributed Third Party Tenancy Alien Store Aggregation Carrier of Research Context • Identifiable, citable, resolvable • Uniform Management • Mixed Stewardship • Decay & Graceful Degrade • Content & Aggregation Lifecycles • Annotations • Manifests, Recipes, Permissions, Discourse Aggregations • Dispersed / Encapsulated • External (linked) / Local • Mixed types • Blackboxes • Virtual / Materialised Content Resources • Aggregations themselves • In many aggregations • Virtual / Materialised • Open / Closed
  50. 50. TARDIS:Time and Relative Dimension in Space
  51. 51. RO Model Ontology
  52. 52. • RO Management – Transportation / Access / Citation – Id location of RO “container” – Provenance of RO & contents – Behaviour/lifecycle of RO & contents – Policies • RO Interpretation – What the RO and its content mean – How they can be compared and validated – How they can be used, executed, linked • Interpretation variations – Type (e.g.Workflows) – Discipline (e.g. Biology) – Task (e.g. Discovery, Execution) – Activity (e.g. Experiment) Progression Levels Management and Interpretation for Integrated Applications
  53. 53. Progression Levels Management and Interpretation for Integrated Applications • RO Management – Transportation / Access / Citation – Id location of RO “container” – Provenance of RO & contents – Behaviour/lifecycle of RO & contents – Policies • RO Interpretation – What the RO and its content mean – How they can be compared and validated – How they can be used, executed, linked • Interpretation variations – Type (e.g.Workflows) – Discipline (e.g. Biology) – Task (e.g. Discovery, Execution) – Activity (e.g. Experiment)
  54. 54. Checklists Versioning Provenance Dependencies More Stakeholders & Services Citation minimum More specialised detail Fewer but more specialised stakeholders & services Annotation Profiles . Depth: how deeply described Coverage: how much is covered. Progression levels Semantic Framework
  55. 55. Checklists Versioning Provenance Dependencies NISO-JATS EXPO, ISA JERM, OBI MIAME, SBML GIT MIM Ontology PROV PAV VoID Puppet Docker Make PAV RO Model roevowfprov wfdesc SysBio Workflows DCAT Annotation Profiles . Depth: how deeply described Coverage: how much is covered. Progression levels Semantic FrameworkExperiment VIVO-ISF DC
  56. 56. Checklists aka Minimum Information Models  Safety, quality, consistency  Validation, monitoring  Common in experimental science  Checklists defined in terms of the RO model and its annotations  Services execute against model and an RO’s annotations Zhao et. al. A Checklist-BasedApproach for QualityAssessment of Scientific Information 3rd In.Workshop on LinkedScience, 2013 Minim Checklist Ontology to describe checklists Must, Should… Cardinalities… Rules… http://purl.org/net/mim/ns
  57. 57. Towards Smart IntegratedApplications & Mediation 1. Id & Cite fluid things 2. First class citizenship & uniform handling of artifacts 3. Compound 4. Mixed, leaky Containers 5. Span outcomes, evolve outputs, emergence 6. Layered interpretation and management profiles using standards 7. Machine-processable 8. Technology Independent Bechhofer,Why linked data is not enough for scientists, DOI: 10.1016/j.future.2011.08.004
  58. 58. Towards Smart IntegratedApplications & Mediation Bechhofer,Why linked data is not enough for scientists, DOI: 10.1016/j.future.2011.08.004 1. Id & Cite fluid things 2. First class citizenship & uniform handling of artifacts 3. Compound 4. Mixed, leaky Containers 5. Span outcomes, evolve outputs, emergence 6. Layered interpretation and management profiles using standards 7. Machine-processable 8. Technology Independent
  59. 59. Research Objects Framework a systematic approach to representing a different unit of scholarship “development” view“logical” view “process” view “physical” view SERVICESPOLICIES LIFECYCLESMETADATA PROFILES
  60. 60. Lets Bake Research Objects!
  61. 61. ments as the access and live repositories, it could be implemented with slower (or offline) stora tives. Open Archival Information System Pilot ROs are “Information Packages” ROManager RODL
  62. 62. • A single, transferable object encapsulates description and resources – Download, transfer, publish • ZIP-based format + manifest describes aggregation and annotations – Unpack with standard tooling • JSON-LD for manifest – Lightweight linked-data format – Use JSON tooling and services Baking with off the shelf platforms OMEX archive bundle Adobe UCF OREPROVODF
  63. 63. • Work with local folder structure. – Version: github. – Metadata: Local tooling – Metadata about aggregation and its resources: “hidden folder” • Zenodo/figshare pull snapshot from github – DOIs for aggregation – new DOIs: release cycles Baking with off the shelf platforms http://dx.doi.org/10.6084/m9.figshare.1031591
  64. 64. FARSITE coded descriptions of clinical study cohorts an NHS tool to assess the feasibility of gathering a cohort packages codes, study, and metadata Home Baking
  65. 65. In theWild Safari
  66. 66. integrated database and journal http://www.gigasciencejournal.com galaxy.cbiit.cuhk.edu.hk [Peter Li]
  67. 67. Nanopub: represents structured data along with its provenance in a single publishable and citable entry Galaxy workflows: re-enact the analysis Research Object: aggregates the (digital) resources contributing to findings of (computational) research (results, data and software) as citable compound digital objects http://isa-tools.github.io/soapdenovo2/ http://sandbox.wf4ever-project.org/portal/ro?ro=http://sandbox.wf4ever-project.org/rodl/ROs/SOAP2denovo2-Aureus/ [Alejandra Gonzalez-Beltran Philippe Rocca-Serra]
  68. 68. what’s the least we can do? how might ROs minted and used by science teams? how might ROs be implemented and used by developer teams? Standards Models Platforms Id Schemes Resolution Light touch Extensible Infiltration Mapping Making, Curating, Using Nudging Sharing Linking Infiltration Embedding into and changing work practices TOOLS Citing Technical Social Reward Mixed stewardship Citation Schemes Fragility
  69. 69. [Norman Morrison]
  70. 70. (meta)Data Capture Platforms ProcessCapture Platforms
  71. 71. Stealthy not Sneaky to reduce the friction instrument the world Incremental JIJIT not JIC Focus on Personal Productivity not Public Good Auto-magical From made reproducible to born reproducible What’s the least we can do?
  72. 72. KnowledgeTurns Transportation & Mediation Unit of Scholarly Currency Context, Comparison Distributed: Search, Discover, Index, Harvest, Port Research Turns Release model: Evolution, Emergence, Discourse, Comparison, Historical review Forks, Merges & Fixivity Flow across groups, projects and articles Anti-Salami, Threaded Publications Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012Goble, De Roure, Bechhofer, Accelerating Knowledge Turns, I3CK, 2013 Profile Focus Body of knowledge around methods, workflows, software, data, person, rather than publication. First class citation, credit and respect
  73. 73. Open Research Practice is (increasingly) like Open Source Software Practice. (Which we know a lot about)
  74. 74. FAIR research practice benefits from a shared and principled approach for identification, aggregation and annotation of research components of all kinds. – Using existing standards, vocabularies, frameworks, platforms, infrastructures. Using linked data and semantic interoperability VIVO - to represent the full context of researchers’ work. SciTS – to study the research process and research collaboration
  75. 75. http://www.researchobject.org
  76. 76. • Barend Mons • Sean Bechhofer • Philip Bourne • Matthew Gamble • Raul Palma • Jun Zhao • AlanWilliams • Stian Soiland-Reyes • Paul Groth • Tim Clark • Juliana Freire • Alejandra Gonzalez-Beltran • Philippe Rocca-Serra • Ian Cottam All the members of the Wf4Ever team iSOCO: Intelligent Software Components S.A., Spain University of Manchester, School of Computer Science, Manchester, United Kingdom University of Oxford, Department of Zoology, Oxford, UK Poznan Supercomputing and Networking Center. Poznan, Poland IAA: Instituto de Astrofísica de Andalucía, Granada, Spain Leiden University Medical Centre, Centre for Human and Clinical Genetics, The Netherlands Colleagues in Manchester’s Information Management Group RO Advisory Board Members http://www.researchobject.org http://www.wf4ever-project.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×