2012 03-28 Wf4ever, preserving workflows as digital research objects

Wf4Ever:
Preserving workflows as
digital Research Objects
Stian Soiland-Reyes
myGrid, University of Manchester

EGI Community Forum 2012, Workflow Systems workshop
Leibniz Supercomputing Centre, Münich, 2012-03-28

My background

Taverna - Scientific Workflow Management
System
~85000 downloads
~EU projects: SCAPE, BioVeL, HELIO,
http://www.taverna.org.uk/
e-Lico, VPH-SHARE, EGI-INSPiRE….

myExperiment - Web 3.0 virtual
environment, library and social
network for workflows
http://www.myexperiment.org/
~5000 registered users
~2200 workflows
~21 different systems

2

“A biologist would rather share their
toothbrush than their gene name”

Mike Ashburner and others
Professor in Dept of Genetics,
University of Cambridge, UK

http://www.myexperiment.org/

 “Facebook for Scientists”  A probe into researcher behaviour
...but different to Facebook!

 A repository of research methods  Open source (BSD) Ruby on Rails app

 A social network of people and things  REST and SPARQL, Linked Data

 A Social Virtual Research Environment  Influenced BioCatalogue, MethodBox
and SysMO-SEEK

myExperiment currently has 5378 members, 292 groups, 2273
workflows, 534 files and 217 packs

 Workflow Preservation
 Research Objects
 Provenance
 Recommendation
 Astronomy and Genomics
http://www.wf4ever-project.org/

Wf4Ever
Challenges
Preservation of scientific workflows » Scientific workflows enable automation
in data-intensive science of scientific methods and encourage
best practices to be shared
» Workflows need to be preserved for
› Reuse, fundamental for incremental
scientific development
› Method reproducibility, key for
credit and publication
» Workflow preservation is complex!
» Heterogeneous types of information
need to be aggregated, including
workflows and related resources
forming research objects
» Research objects need to be trusted and
understandable n years from now
» Social aspects need to be addressed in
order to support reuse in scientific
communities
7

The R.* dimensions

Reusable. The key tenet of Research Replayable. Studies might involve
Objects is to support the sharing and single investigations that happen in
reuse of data, methods and processes. milliseconds or protracted processes
Repurposeable. Reuse may also that take years.
involve the reuse of constituent parts of Referenceable. If research objects are
the Research Object. to augment or replace traditional
Repeatable. There should be sufficient publication methods, then they must be
referenceable or citeable.
information in a Research Object to be
able to repeat the study, perhaps years Revealable. Third parties must be able
later. to audit the steps performed in the
Reproducible. A third party can start research in order to be convinced of the
validity of results.
with the same inputs and methods and
see if a prior result can be confirmed. Respectful. Explicit representations of
the provenance, lineage and flow of
intellectual property.
Replacing the Paper: The Twelve Rs of the e-Research Record” on http://blogs.nature.com/eresearch/

Wf4Ever
Forms of decay
Workflow Decay
• Service decay
• Flux/decay/unavailability
• Data decay
• Formats/ids/standards
• Infrastructure decay
• platform/resources

Experiment Decay
• Methodological changes
• New technologies
• New resources/components
• New data
9

Preservation, Conservation, Recreating

Preserving
Archived Record
Fixed Snapshots
Review
Rerun & Replay

Conserving
Active Instrument
Live
Rerun & Reuse
Repair & Restore

Recreating
Archived Record
Active Instrument
Live
Rebuild Recycle Repurpose

10

Workflow Decay
Decay at different abstraction levels

Redo

Flux

Flux

Flux

11
http://www.gridworkflow.org/kwfgrid/gwes/docs/

Research objects

12

Research Objects as Social Objects

13 13
13

http://purl.org/wf4ever/ro#
Research Object model core (simplified)

ore:aggregates
ro:ResearchObject
ro:Resource ore:isDescribedBy

ro:Manifest
wfdesc:Workflow

ro:annotatesAggregatedResource ro:AggregatedAnnotation

Note: This figure shows a simplified view of the RO core.

RO specification: http://wf4ever.github.com/ro/
14

http://purl.org/wf4ever/ro#
Research Object model core

15

http://purl.org/wf4ever/wfdesc#
RO model: Workflow Description

16

http://purl.org/wf4ever/wfprov#
Workflow Provenance (wfprov)

17

Technical infrastructure

• Models  Semantic Web Encoding
• Research Object
• Annotation
• Provenance
• Evolution and Versioning
• Services Web APIs, REST services
• Foundational, Extension, User
• APIs, Architecture
• Principles
• Map into standards
• Adopt standards
• Lightweight components
• Ecosystem
• Command line
• Portal
• Third party systems
18

The Wf4Ever Proposal
Services

User
Clients

Extension
Services

Foundation
Services

19

Wf4Ever Reference Implementation
Prototype, Dec 2011

Access & Usage Clients

Dropbox Client
RO Portal RO Manager Tool
ROBox

Data Management & Analysis Services

Stability Completeness
Recommender
Evaluation Evaluation

Storage Services Lifecycle Services

Taverna Workflow
Mgmt System
RO Digital Library

20

Roadmap
Year 1 (Dec 2010  Dec 2011)

» Exploration (2011)
Problem specification and requirements identification
Better understanding of workflow preservation needs
from the domains (what does it mean to preserve a
scientific workflow?)
Proofs of concepts
Preliminary models, components, and integrated
reference implementation
Result identification

21

Roadmap
Year 2 (Dec 2011  Dec 2012)

Realization/validation (2012)
› Validate the models, architectures and software in practice
› Distributed components with different access/security
arrangements – forming REST APIs and specifications
› RO Content Campaign: Generate 1000s of ROs
› First productization phase: Stable releases of models and
reference implementation
› Decay monitoring and notification (why my wf is no longer
stable), reacting to decay, attribution and credit support
beyond recommendation. Detailed use of provenance
› Execution and interoperability support (SHIWA integration)
22

Roadmap
Year 3 (Dec 2012  Dec 2013)

» Exploitation (2013)
› Final productization phase
› Deployment in user environments and systems, enhanced with
workflow preservation capabilities
› RO-enabled myExperiment
› RO-enabled Galaxy
› RO-enabled dataVerse
› … and more!
› Deployment in publishers e.g. Elsevier, Digital Science,
GigaScience

23

Collaborations and impact
» SHIWA – Sharing Interoperable Workflows
» Publishers/journals: Elsevier, GigaScience (by BGI)
» OpenPHACTS (nanopublications)
» SCAPE (dataset preservation)
» BioVel (biodiversity - species preservation!)
» Dataverse (data repository)
» Galaxy (workflow system for genomics)
» GenomeSpace (data integration platform)

24

Thank you!

Any Questions?

http://www.wf4ever-project.org/

This work is licensed under the Creative Commons Attribution 3.0
Unported License. To view a copy of this license, visit
http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative
Commons, 444 Castro Street, Suite 900, Mountain View, California,
94041, USA. 25

2012 03-28 Wf4ever, preserving workflows as digital research objects

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to 2012 03-28 Wf4ever, preserving workflows as digital research objects

Similar to 2012 03-28 Wf4ever, preserving workflows as digital research objects (20)

More from Stian Soiland-Reyes

More from Stian Soiland-Reyes (14)

Recently uploaded

Recently uploaded (20)

2012 03-28 Wf4ever, preserving workflows as digital research objects