More Related Content

More from Stian Soiland-Reyes(20)


2012 03-28 Wf4ever, preserving workflows as digital research objects

  1. Wf4Ever: Preserving workflows as digital Research Objects Stian Soiland-Reyes myGrid, University of Manchester EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28
  2. My background Taverna - Scientific Workflow Management System ~85000 downloads ~EU projects: SCAPE, BioVeL, HELIO, e-Lico, VPH-SHARE, EGI-INSPiRE…. myExperiment - Web 3.0 virtual environment, library and social network for workflows ~5000 registered users ~2200 workflows ~21 different systems 2
  3. “A biologist would rather share their toothbrush than their gene name” Mike Ashburner and others Professor in Dept of Genetics, University of Cambridge, UK
  4.  “Facebook for Scientists”  A probe into researcher behaviour ...but different to Facebook!  A repository of research methods  Open source (BSD) Ruby on Rails app  A social network of people and things  REST and SPARQL, Linked Data  A Social Virtual Research Environment  Influenced BioCatalogue, MethodBox and SysMO-SEEK myExperiment currently has 5378 members, 292 groups, 2273 workflows, 534 files and 217 packs
  5.  Workflow Preservation  Research Objects  Provenance  Recommendation  Astronomy and Genomics
  6. Wf4Ever Challenges Preservation of scientific workflows » Scientific workflows enable automation in data-intensive science of scientific methods and encourage best practices to be shared » Workflows need to be preserved for › Reuse, fundamental for incremental scientific development › Method reproducibility, key for credit and publication » Workflow preservation is complex! » Heterogeneous types of information need to be aggregated, including workflows and related resources forming research objects » Research objects need to be trusted and understandable n years from now » Social aspects need to be addressed in order to support reuse in scientific communities 7
  7. The R.* dimensions Reusable. The key tenet of Research Replayable. Studies might involve Objects is to support the sharing and single investigations that happen in reuse of data, methods and processes. milliseconds or protracted processes Repurposeable. Reuse may also that take years. involve the reuse of constituent parts of Referenceable. If research objects are the Research Object. to augment or replace traditional Repeatable. There should be sufficient publication methods, then they must be referenceable or citeable. information in a Research Object to be able to repeat the study, perhaps years Revealable. Third parties must be able later. to audit the steps performed in the Reproducible. A third party can start research in order to be convinced of the validity of results. with the same inputs and methods and see if a prior result can be confirmed. Respectful. Explicit representations of the provenance, lineage and flow of intellectual property. Replacing the Paper: The Twelve Rs of the e-Research Record” on
  8. Wf4Ever Forms of decay Workflow Decay • Service decay • Flux/decay/unavailability • Data decay • Formats/ids/standards • Infrastructure decay • platform/resources Experiment Decay • Methodological changes • New technologies • New resources/components • New data 9
  9. Preservation, Conservation, Recreating Preserving Archived Record Fixed Snapshots Review Rerun & Replay Conserving Active Instrument Live Rerun & Reuse Repair & Restore Recreating Archived Record Active Instrument Live Rebuild Recycle Repurpose 10
  10. Workflow Decay Decay at different abstraction levels Redo Flux Flux Flux 11
  11. Research objects 12
  12. Research Objects as Social Objects 13 13 13
  13. Research Object model core (simplified) ore:aggregates ro:ResearchObject ro:Resource ore:isDescribedBy ro:Manifest wfdesc:Workflow ro:annotatesAggregatedResource ro:AggregatedAnnotation Note: This figure shows a simplified view of the RO core. RO specification: 14
  14. Research Object model core 15
  15. RO model: Workflow Description 16
  16. Workflow Provenance (wfprov) 17
  17. Technical infrastructure • Models  Semantic Web Encoding • Research Object • Annotation • Provenance • Evolution and Versioning • Services Web APIs, REST services • Foundational, Extension, User • APIs, Architecture • Principles • Map into standards • Adopt standards • Lightweight components • Ecosystem • Command line • Portal • Third party systems 18
  18. The Wf4Ever Proposal Services User Clients Extension Services Foundation Services 19
  19. Wf4Ever Reference Implementation Prototype, Dec 2011 Access & Usage Clients Dropbox Client RO Portal RO Manager Tool ROBox Data Management & Analysis Services Stability Completeness Recommender Evaluation Evaluation Storage Services Lifecycle Services Taverna Workflow Mgmt System RO Digital Library 20
  20. Roadmap Year 1 (Dec 2010  Dec 2011) » Exploration (2011) Problem specification and requirements identification Better understanding of workflow preservation needs from the domains (what does it mean to preserve a scientific workflow?) Proofs of concepts Preliminary models, components, and integrated reference implementation Result identification 21
  21. Roadmap Year 2 (Dec 2011  Dec 2012) Realization/validation (2012) › Validate the models, architectures and software in practice › Distributed components with different access/security arrangements – forming REST APIs and specifications › RO Content Campaign: Generate 1000s of ROs › First productization phase: Stable releases of models and reference implementation › Decay monitoring and notification (why my wf is no longer stable), reacting to decay, attribution and credit support beyond recommendation. Detailed use of provenance › Execution and interoperability support (SHIWA integration) 22
  22. Roadmap Year 3 (Dec 2012  Dec 2013) » Exploitation (2013) › Final productization phase › Deployment in user environments and systems, enhanced with workflow preservation capabilities › RO-enabled myExperiment › RO-enabled Galaxy › RO-enabled dataVerse › … and more! › Deployment in publishers e.g. Elsevier, Digital Science, GigaScience 23
  23. Collaborations and impact » SHIWA – Sharing Interoperable Workflows » Publishers/journals: Elsevier, GigaScience (by BGI) » OpenPHACTS (nanopublications) » SCAPE (dataset preservation) » BioVel (biodiversity - species preservation!) » Dataverse (data repository) » Galaxy (workflow system for genomics) » GenomeSpace (data integration platform) 24
  24. Thank you! Any Questions? This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA. 25