2011 07-16 SCUFL2 - because a workflow is more than its definition (BOSC 2011)
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


2011 07-16 SCUFL2 - because a workflow is more than its definition (BOSC 2011)



Scufl2 – because a workflow is more than its definition ...

Scufl2 – because a workflow is more than its definition

Stian Soiland-Reyes, Alan R Williams, Stuart Owen, David Withers and Carole Goble

School of Computer Science, University of Manchester, UK

Project site: http://www.taverna.org.uk/

Source code: https://github.com/mygrid/scufl2 http://taverna.googlecode.com/

License: GNU Lesser General Public License (LGPL) 2.1

Taverna is a scientific workflow management system that has gained popularity amongst scientists in bioinformatics, chemistry, astronomy and other domains. Since its inception in 2003, Taverna's workflow language has evolved from Scufl which was designed for the FreeFluo workflow engine. Scufl defines a directed acyclic graph of the data flow between processors (services) which receive and produce data at input/output ports. Scufl workflows are stored in a lightweight XML format.

t2flow adapted these concepts for the Taverna 2 workflow engine, including new capabilities such as extensible execution control. The t2flow XML format was created as a serialisation of internal Java beans, which unfortunately makes it difficult to consume or produce by non-Taverna clients compared to Scufl. There have also been growing demands for bundling a workflow with related resources such as example data and semantic annotations. Addressing these issues culminated in forming a new workflow language Scufl2, which is presented here.

By adapting linked data technology and preservation methodologies for research objects (http://bit.ly/dKiGz7), Scufl2 is not only a platform-independent workflow language which can be inspected, modified, created and executed by third-party tools and systems; it is extensible to allow the capture of workflow execution inputs and outputs, reference data sets, provenance, annotations, documentation, publications, alternative formats and representations, and even embedded binary service implementations.

Scufl2 comes with a Java API independent of Taverna that can be used for programmatic access to read and write Scufl2 workflow bundles. A workflow bundle is a structured ZIP-file based on Adobe UCF, ePub OCF and the Open Document Format (ODF), with the workflow definitions included as XML Schema-conformant documents which are also valid RDF/XML. The constraints from the schema allow clients to read and write Scufl2 workflow definitions as regular structured XML. Richer RDF-enabled clients may link workflow definitions with external resources and additional annotations stored in separate RDF files in the bundle, perhaps using vocabularies such as Dublin Core.

The workflow structure is defined using an OWL ontology, and annotated with URIs so that third parties can form semantic statements about any component of any Scufl2 workflow, for instance to say that a particular service produces outputs of a certain type, or that a given data link was added by a different researcher to the builder of the rest of the workflow. An unzipped workflow bundle can be made public using any standard web server and become part of the Linked Open Data cloud.

Semantic annotations and a manifest for the bundle declare the purpose of, and links between the different components forming a workflow. This allows third parties to extract and append annotations about data and services used by the workflow. The evolution of the workflow definition itself can also be included in these annotations, with references to previous versions and authors.

In recent years, Taverna has become an interesting target for researchers investigating published workflows, such as those found on myExperiment. Exploring these workflow definitions using the Scufl2 API can reveal compatibilities and implicit data types of web services, providing new annotations for service registries like BioCatalogue. Sites and tools can add additional descriptions of the workflow and the experiment directly to the Scufl2 workflow bundle without affecting execution behaviour or requiring any d



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

2011 07-16 SCUFL2 - because a workflow is more than its definition (BOSC 2011) Presentation Transcript

  • 1. Stian Soiland-Reyes
    myGrid, School of Computer Science
    University of Manchester, UK
    Scufl2: Because a workflow is more than its definition
    BOSC 2011
    Vienna, 2011-07-16
  • 2. Taverna workflows
    A set of (local and remote) services to analyze or manage data
    Nested workflows are also services
    Data-links connects services
    i.e. output from service A is input to service B and C
    Describes the desired dataflow instead of process coordination
    Automatic iterations
    Can customize list handling and control links
    Taverna 2.3 released 2011-07-14
  • 3. Taverna workflow features
    Nested workflows
    Reuse existing components
    Implicit iterations
    With customizable list handling
    Process partial iteration results early
    Run as soon as data is available
    Retries, failover, looping
    For stability and conditional testing
    Plugin-extensible execution control
    Ideas: caching, error detection, dynamic service lookup
  • 4. Extensible workflow engine
    New service types/protocols
    Execution control like looping/branching/start/stop
    Service discovery at runtime
  • 5. Taverna’s existing wf formats
    Taverna 1: SCUFL 2004-2007
    Lightweight XML format
    Extensions in arbitrary formats (“Put your XML element here”)
    Parsed and written by many 3rd parties
    Taverna 2: t2flow 2007-2011
    Supports control and service extensions of T2
    Annotation system
    ‘Heavy’ XML serialisation of Java beans
    Tricky to use by 3rd parties
  • 6. SCUFL2 motivations
    Easy to use for third-parties
    Semantic web/linked open data compatible identifiers and ontology
    Extensible in a controlled manner
    Plugins should provide a minimal ‘schema’
    But parsers should not need to know about plugin
    Bundle relevant resources
    Input/output data, provenance, scripts, binaries
    Annotations on anything
    Reproducible workflow results
  • 7. Research objects
    Reusable – used as part of new study;
    Repurposeable – reuse the pieces in a new (and different) study. Substitute alternative data sets, methods;
    Repeatable – repeat the study, possibly years later;
    Reproducible – a special case of repeatability with a complete set of information/results to work towards;
    Replayable – go back and see what happened;
    Referenceable – cite in publications;
    Revealable – provenance and audit;
    Re-interpretable – crossing boundaries;
    Respectful – appropriate credit and attribution;
    Retrievable – discover and acquire.
  • 8. Research Object vision
    ROs: Aggregationsto support sharing/publication
    Incorporating methods, data, people
    Research Objects will allow us to conduct research inways that are
    Efficient: cheaper to borrow than recreate
    Effective: larger scale through reuse
    Ethical: Benefiting wider communities, not just individuals
    Could I have a copy of your Research Object please?
  • 9. WfEver project
    Architecture/implementation for workflow preservation, sharingand reuse
    Research Object models
    Workflow Decay, Integrity and Authenticity
    Workflow Evolution and Recommendation
    Driven by Use Cases: Biology & Astronomy
    …technological infrastructure for the preservation and
    efficient retrieval and reuse of scientific workflows in a range
    of disciplines.
  • 10. What is SCUFL2?
    File format for (Taverna) workflows and data
    Specification (wiki)
    Ontology (OWL)
    Schema (XSD)
    API (Java +Ruby?)
    Conversion Tool (Java)
  • 11. SCUFL2 bundles
    ZIP archive
  • 12. Bundle manifest
    “Structured” zip-file
    (can be unpacked to be exposed on the web)
    Self-documenting media type (OpenOffice ODF, ePub OCF, Adobe UCF) – for tools like file(1) and mime magic
    Root file: Primary document of the bundle (ePub OCF/Adobe UCF).
    Alternative representations of same workflow bundle allowed (Turtle, JSON, HTML, etc)
    <container version="1.0“xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
            <rootfilefull-path="workflowBundle.rdf" media-type="application/rdf+xml" />
            <!-- <rootfile full-path="workflowBundle.ttl"
             media-type="text/turtle" /> Alternative repr. --> 
  • 13. Workflow bundle
    Unique identifier which can be used as prefix for all relative references in bundle
    Suggests main workflow and main profile, but executor could (e.g. by parameter) run a different workflow or profile
    <rdf:RDF xmlns="http://ns.taverna.org.uk/2010/scufl2#" xmlns:rdf=".."xmlns:rdfs=".."xmlns:xsi=".."xsi:schemaLocation="http://ns.taverna.org.uk/2010/scufl2# http://ns.taverna.org.uk/2010/scufl2/scufl2.xsd .."xsi:type="WorkflowBundleDocument" xml:base="./">
          <WorkflowBundle rdf:about="">
              <sameBaseAsrdf:resource= "http://ns.taverna.org.uk/2010/workflowBundle/28f7c..a0ef731/" />        
              <mainWorkflow rdf:resource="workflow/HelloWorld/" />
                <Workflow rdf:about="workflow/HelloWorld/">
                  <rdfs:seeAlso rdf:resource="workflow/HelloWorld.rdf" />
               </Workflow> <!-- plus each nested <Workflow> -->
              <mainProfile rdf:resource="profile/tavernaWorkbench/" />
               <Profile rdf:about="profile/tavernaWorkbench/">
                 <rdfs:seeAlso rdf:resource="profile/tavernaWorkbench.rdf" />
                </Profile> <!– plus optional alternative <Profile>-->
              <rdfs:seeAlso rdf:resource="annotation/user_annotations.rdf" />
              <rdfs:seeAlso rdf:resource="annotation/myExperiment-wf-765.rdf" />
    Additional (non-executable) annotations and metadata
  • 14. Technologies and standards
    OpenOffice ODF manifest
    ePub OCF/Adobe Universal Container Format (UCF) container rootfile, mimetype
    RDF/XML and OWL structure and annotations
    XML Schema for ‘structured’ RDF/XML
    Open Annotation Collaboration (OAC)?
    ORI-ORE? Aggregation
    Open Provenance Model (OPM)?
  • 15. Status:
    Want people to try it out
    Open for suggestions and feedback
    Beta to be relased in Autumn 2011
    Native format for upcoming Taverna 3 in 2012
    Load/Save plugin for Taverna 2.3 2011 Q4
    Future plans
    Explore further research object & provenance
    Converting to/from wf systems like Galaxy?
  • 16. http://www.mygrid.org.uk/
  • 17. http://www.wf4ever-project.org/
  • 18. Acknowledgements
    Funded by: EPSRC EP/G026238/1; European Commission’s 7th FWP FP7-ICT-2007-6 270192;  FP7-ICT-2007-6 270137
  • 19. More information
    Poster 21