2012 03-28 Wf4ever, preserving workflows as digital research objects


Published on

Presented on 2012-03-28 at EGI Community Forum 2012, Munich.





Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

2012 03-28 Wf4ever, preserving workflows as digital research objects

  1. 1. Wf4Ever:Preserving workflows asdigital Research Objects Stian Soiland-Reyes myGrid, University of Manchester EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich, 2012-03-28
  2. 2. My background Taverna - Scientific Workflow Management System ~85000 downloads ~EU projects: SCAPE, BioVeL, HELIO,http://www.taverna.org.uk/ e-Lico, VPH-SHARE, EGI-INSPiRE…. myExperiment - Web 3.0 virtual environment, library and social network for workflowshttp://www.myexperiment.org/ ~5000 registered users ~2200 workflows ~21 different systems 2
  3. 3. “A biologist would rather share their toothbrush than their gene name” Mike Ashburner and others Professor in Dept of Genetics, University of Cambridge, UK
  4. 4. http://www.myexperiment.org/  “Facebook for Scientists”  A probe into researcher behaviour ...but different to Facebook!  A repository of research methods  Open source (BSD) Ruby on Rails app A social network of people and things  REST and SPARQL, Linked Data A Social Virtual Research Environment  Influenced BioCatalogue, MethodBox and SysMO-SEEK myExperiment currently has 5378 members, 292 groups, 2273 workflows, 534 files and 217 packs
  5. 5.  Workflow Preservation  Research Objects  Provenance  Recommendation Astronomy and Genomics http://www.wf4ever-project.org/
  6. 6. Wf4Ever ChallengesPreservation of scientific workflows » Scientific workflows enable automation in data-intensive science of scientific methods and encourage best practices to be shared » Workflows need to be preserved for › Reuse, fundamental for incremental scientific development › Method reproducibility, key for credit and publication » Workflow preservation is complex! » Heterogeneous types of information need to be aggregated, including workflows and related resources forming research objects » Research objects need to be trusted and understandable n years from now » Social aspects need to be addressed in order to support reuse in scientific communities 7
  7. 7. The R.* dimensionsReusable. The key tenet of Research Replayable. Studies might involveObjects is to support the sharing and single investigations that happen inreuse of data, methods and processes. milliseconds or protracted processesRepurposeable. Reuse may also that take years.involve the reuse of constituent parts of Referenceable. If research objects arethe Research Object. to augment or replace traditionalRepeatable. There should be sufficient publication methods, then they must be referenceable or citeable.information in a Research Object to beable to repeat the study, perhaps years Revealable. Third parties must be ablelater. to audit the steps performed in theReproducible. A third party can start research in order to be convinced of the validity of results.with the same inputs and methods andsee if a prior result can be confirmed. Respectful. Explicit representations of the provenance, lineage and flow of intellectual property. Replacing the Paper: The Twelve Rs of the e-Research Record” on http://blogs.nature.com/eresearch/
  8. 8. Wf4Ever Forms of decayWorkflow Decay• Service decay • Flux/decay/unavailability• Data decay • Formats/ids/standards• Infrastructure decay • platform/resourcesExperiment Decay• Methodological changes• New technologies• New resources/components• New data 9
  9. 9. Preservation, Conservation, RecreatingPreservingArchived RecordFixed SnapshotsReviewRerun & ReplayConservingActive InstrumentLiveRerun & ReuseRepair & RestoreRecreatingArchived RecordActive InstrumentLiveRebuild Recycle Repurpose 10
  10. 10. Workflow Decay Decay at different abstraction levels Redo Flux Flux Flux 11http://www.gridworkflow.org/kwfgrid/gwes/docs/
  11. 11. Research objects 12
  12. 12. Research Objects as Social Objects13 13 13
  13. 13. http://purl.org/wf4ever/ro# Research Object model core (simplified) ore:aggregates ro:ResearchObject ro:Resource ore:isDescribedBy ro:Manifestwfdesc:Workflow ro:annotatesAggregatedResource ro:AggregatedAnnotation Note: This figure shows a simplified view of the RO core. RO specification: http://wf4ever.github.com/ro/ 14
  14. 14. http://purl.org/wf4ever/ro#Research Object model core 15
  15. 15. http://purl.org/wf4ever/wfdesc#RO model: Workflow Description 16
  16. 16. http://purl.org/wf4ever/wfprov#Workflow Provenance (wfprov) 17
  17. 17. Technical infrastructure• Models  Semantic Web Encoding • Research Object • Annotation • Provenance • Evolution and Versioning• Services Web APIs, REST services • Foundational, Extension, User • APIs, Architecture• Principles • Map into standards • Adopt standards • Lightweight components• Ecosystem • Command line • Portal • Third party systems 18
  18. 18. The Wf4Ever Proposal ServicesUserClientsExtensionServicesFoundationServices 19
  19. 19. Wf4Ever Reference Implementation Prototype, Dec 2011 Access & Usage Clients Dropbox Client RO Portal RO Manager Tool ROBox Data Management & Analysis Services Stability Completeness Recommender Evaluation EvaluationStorage Services Lifecycle Services Taverna Workflow Mgmt System RO Digital Library 20
  20. 20. Roadmap Year 1 (Dec 2010  Dec 2011)» Exploration (2011) Problem specification and requirements identification Better understanding of workflow preservation needs from the domains (what does it mean to preserve a scientific workflow?) Proofs of concepts Preliminary models, components, and integrated reference implementation Result identification 21
  21. 21. Roadmap Year 2 (Dec 2011  Dec 2012)Realization/validation (2012) › Validate the models, architectures and software in practice › Distributed components with different access/security arrangements – forming REST APIs and specifications › RO Content Campaign: Generate 1000s of ROs › First productization phase: Stable releases of models and reference implementation › Decay monitoring and notification (why my wf is no longer stable), reacting to decay, attribution and credit support beyond recommendation. Detailed use of provenance › Execution and interoperability support (SHIWA integration) 22
  22. 22. Roadmap Year 3 (Dec 2012  Dec 2013)» Exploitation (2013) › Final productization phase › Deployment in user environments and systems, enhanced with workflow preservation capabilities › RO-enabled myExperiment › RO-enabled Galaxy › RO-enabled dataVerse › … and more! › Deployment in publishers e.g. Elsevier, Digital Science, GigaScience 23
  23. 23. Collaborations and impact» SHIWA – Sharing Interoperable Workflows» Publishers/journals: Elsevier, GigaScience (by BGI)» OpenPHACTS (nanopublications)» SCAPE (dataset preservation)» BioVel (biodiversity - species preservation!)» Dataverse (data repository)» Galaxy (workflow system for genomics)» GenomeSpace (data integration platform) 24
  24. 24. Thank you! Any Questions? http://www.wf4ever-project.org/This work is licensed under the Creative Commons Attribution 3.0Unported License. To view a copy of this license, visithttp://creativecommons.org/licenses/by/3.0/ or send a letter to CreativeCommons, 444 Castro Street, Suite 900, Mountain View, California,94041, USA. 25