10 Best Practices for Workflow Design

44,912 views

Published on

Presented at the 2nd BioVeL Workshop on taxonomic and phylogenetic workflows (http://www.biovel.eu/index.php?option=com_content&view=article&id=43:ms6-workshop&catid=22:biovel-meetings&Itemid=122)

Published in: Technology, Business
3 Comments
17 Likes
Statistics
Notes
  • Nice post . BTW , people need to fill out a MI DC 101 , We used a template form here http://pdf.ac/aSCpZc.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • This is interesting! This post about “10 Best Practices for Workflow Design” has given me so much enlightenment because I have been searching for a workflow management that will work with my company and upon reading this article it gives me so much idea on how to manage the collaboration and integration of all the processes and systems in the business. Also you may check link below for more workflow management system: http://www.quora.com/What-is-the-best-workflow-management-software
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • If you like this, you might also want to read the short paper that came out of it: http://ceur-ws.org/Vol-952/paper_23.pdf (SWAT4LS 2012 conference proceedings, http://ceur-ws.org/Vol-952/)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
44,912
On SlideShare
0
From Embeds
0
Number of Embeds
111
Actions
Shares
0
Downloads
466
Comments
3
Likes
17
Embeds 0
No embeds

No notes for slide
  • Designing a good workflow is part of doing good research!
  • This means that if you know about one or both of them, you should apply their principles to workflow design as well. (At the end we can say that using common sense about doing good science is a general best practice for creating workflows too.) Workflow design is a variant of software design Define hypothesis and approach Sketch a workflow of the approach Implement workflow Trial and error (iterate) Comment: where are the workflow design patterns?
  • Boxes without content, can be in Taverna using e.g. empty script boxes, a powerpoint flow chart, or a napkin; if it is digital (e.g. Taverna) then we can store it digitally. < Comment: add concept mining workflow and a sketch Cite Eleni: 'helps me to share workflow while developing it, that makes it better‘ > How? In Taverna using empty beanshells In PowerPoint In a sketch book Why? Provides a reference point of the main task(s) of the workflow through the implementation process Promots sharing between computer and workflow systems due to its non-explicit nature Helps design experiment Helps communication (supervisors, colleagues)
  • The workflow on the left explains the basic steps of a text mining process. The expanded workflow is much harder to understand. We can use each nested workflow as a workflow on its own. How? Describe and implement each of the executable processes in a workflow individually and independently In Taverna this can be done through nested workflows Why? Facilitates independent testing and validation of the execution of each of the individual modules Encourages re-use Note: Make sure that you publish the separate modules as well as the final nested workflow (unfortunately, myExperiment does not support this very well), or at least annotate the components when you publish the whole
  • How? Consider if you want to populate data models/databases or create outputs of disconnected collections of files Consider who the results are for (overview for users, or the next workflow component) General advice: at least have a report as an output (provenance will have the separate parts anyway) Use Taverna for provenance collection (intermediate results are captured by provenance engine) Why? Easier to think about this at the design stage than trying to adjust a ready workflow Structure potentially large output data
  • How? Example inputs and outputs can be recorded in Taverna Alternatively: add input or output files to a pack containing the workflow Use real example data Why? To help understand the workflow For validation For maintenance Note: Make sure that the input and the output examples are coupled. Keep in mind that the output has a timestamp. It may change due to changes in underlying databases.
  • How? Choose meaningful names for the workflow title, inputs, outputs, and for the processes that constitute the workflow. Focus on how a component is used in this workflow and why it is in there. If it exists, reference to information about what the component does in general (e.g. by referencing a service on BioCatalogue) Assume that a referenced resource may disappear or change at some time in the future Use Taverna description fields and example fields*. Taverna keeps it with the workflow and myExperiment uses this information. Keep any notes that are related to the workflow, but not part of it, linked to it* Example of useful "extra" information: execution time, keywords, contact information, attribution myExperiment offers some of this, but best to put it in the workflow descriptions Why? Doing good science Record what is needed for a publication later on Increase re-usability Cite Kostas: ‘many workflows are badly documented computer programs' The wf4ever project will provide additional support (and incentives) for describing (the purpose of) workflow components, related objects and references (e.g. data sets), and support for storing the elements of an experiment with their metadata in a structured way.
  • Facilitate understanding and reuse
  • How? Use Web Services, any Taverna widget except external tool, and external tool only when it runs over ssh on publicly accessible server Use Taverna with local tools, but installed on a publicly accessible server with the Taverna server Use local tools from an easy to set up environment such as biolinux (only for a certain niche of users) TRY IT!! Why? Others will be able to run the workflow Proof of reproducibility
  • How? Choose the service that is reliable based on: BioCatalogue reliability statistics (in practice: check on biocatalogue if it has a green light (momentarily not much more you can do)) How often it is used in other workflows Contact with service providers. Communicate! The reputation of the institution providing the service check trustworthiness of service provider (can also be a person, of whom you can check if they will remain at an institution to maintain the service) Why? Prevent workflow decay, prolong the life of the workflow Note to service developers: Many work around and ugly workflow practices come from having to deal with badly behaved services!
  • Web Services are digital, their creators not. Communication saves web services and workflows from decay.
  • A common misconception is that because they are workflows, they are automatically stable. It takes effort and often communication to reuse work, especially when using ‘state-of-the-art’ products made by scientists. How? Make your own workflows modular since this promotes reuse Search myExperiment and filter on most downloaded or most viewed Check if it has been used in a publication Use your contacts: maybe someone has tried to solve something similar before using a workflow? Try and try harder, contact authors! Why? Another user that is familiar with one of your workflows, is more likely to understand another workflow that you designed Beneficial when repairing workflows: By repairing a given workflow may entails repairing the workflows in which it is used as a subworkflow Fights redundancy Note: attribute others and respect licenses
  • http://myExperiment.org/workflows/74?version=12 http://myExperiment.org/packs/258 How? Share your workflow on (don’t forget contact info!): myExperiment other social media e-mailing it around to colleagues Cite your workflow when publishing, using a stable identifier like myExperiment Make use of the pack functionality in myExperiment to bundle your workflow with other important documents such as a publication Why? Good science – share your results Get cited – fame! Progress, let others build on your work without reinventing it
  • How? Act on information about services that are deprecated by changing services providing a note that that specific process in the workflow in not executable anymore Put your services on BioCatalogue (don't have to be the owner) and your workflows on myExperiment (notification iits planned) Regularly test the workfow (like 'unit tests') Why? Good practice – this is already demanded for some types of publications, like an application note in Bioinformatics Fight workflow decay, prolong the life of the workflow
  • A Scientific Workflow can be seen as the combination of data and processes into a configurable, structured set of steps that implement semi-automated computational solutions in scientific problem-solving i.e. the implementation of a scientific method Need to be preserved (and conserved). More on this later.
  • Could we skip this slide to save time?
  • 10 Best Practices for Workflow Design

    1. 1. The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft , Carole Goble (myGrid)Thanks: BioSemantics Group (LUMC), myGrid team (UoM), Yassene Mohamed, Harish Dharuri (LUMC)
    2. 2. Our specialty: Knowledge Discovery http://biosemantics.org Disambiguation* Text Mining Substrates for Knowledge Discovery Methods for Knowledge Discovery Applications •Predict protein-protein, protein-disease associations, gene prioritization •Genotype-phenotype studies, e.g. Huntington’s Disease, Metabolic Syndrome •Yours?* Global disambiguation initiative: http://snipurl.com/conceptweballiance 2
    3. 3. Introduction Why build good workflows?Good workflow design = good science! 3
    4. 4. Introduction Best practices for workflow design Best Practices for workflow design =Best Practices experimental science +Best Practices software engineering 4
    5. 5. 1Make a sketch workflow 5
    6. 6. Best practice 1 Sketch an Abstract WorkflowPowerpoint courtersy of Eleni Mina 6
    7. 7. 2Use modules 7
    8. 8. http://www.myexperiment.org/workflows/74.html 8
    9. 9. 3 Think about the output(and the data in your workflow in general) 9
    10. 10. Best practice 3Think about the output ? http://... 10
    11. 11. 4Provide example inputs and outputs 11
    12. 12. Taverna 2.3 Recipe Taverna 2.4 Select input/outputRight-click input/output Select tab ‘Details’ Select ‘Annotation’ Click ‘Annotation’ Add Example Add Example 12
    13. 13. 5Annotate 13
    14. 14. Best practice 5 AnnotateEach component in Taverna can be annotated 14
    15. 15. Best practice 5Annotate and help your users 15
    16. 16. 6Make workflow executable from outside the local environment 16
    17. 17. Best practice 6 Make workflow executable by othersHow to check that others can execute your workflow?» Try it! Proof of executability › Ask a colleague › Use an external t2web runner» Tips › Use Web Services › If you use local command line tools • Install tools on a publicly accessible server (e.g. applies to Rserve) • Use system that your users can set up (e.g. BioLinux) 17
    18. 18. 7Choose services carefully 18
    19. 19. Best practice 7Choose services carefully 19
    20. 20. Best practice 7Choose services carefully 20
    21. 21. 8Reuse existing workflows 21
    22. 22. Best practice 8 The reuse workflow Not a best practice, but a tip: know-how is Check important for reuse Contact authorsworkflows on Neg. RetrymyExperiment Pos. Use scripts from Neg. colleaguesCheck services Search the Contact authors on internet Neg. Retry BioCatalogue Pos. Invent a new wheel Reuse, Attribute Respect licences 22
    23. 23. 9Advertise 23
    24. 24. Advertise Unique reference forin your papers and for others to cite 24
    25. 25. 10Maintain 25
    26. 26. Best Practice 10 MaintainBest practices to support maintenance» Regularly check your workflow › Ask colleagues» Enable support for maintenance › Register your workflow on myExperiment › Register Web Services on» Enable peers to repair: annotate!» Note about versioning › No need to register all edits on myExperiment: use subversion › Register important updates on myExperiment 26
    27. 27. Bonus tipUse common sense as scientist 27
    28. 28. Workflow Forever Preservation of good workflows for future applications Workflow 74 “Protein Discovery” 2005Workflow 2876“Match gene listsby literature” 2012 Workflow 2805 “Get Pathway genes” 2012 28
    29. 29. Wf4Ever Outcomes for BioVeLmyExperiment 2.0BioCatalogueTavernaResearch ObjectsLinked DataMethodsProtocols for Preservation and Conservation 29
    30. 30. The 10 Best Practices of Workflow Design Thank youThank you for your attentionMore information:http://snipurl.com/workflowbestpractices1. Make a sketch workflow2. Use modules3. Think about the output4. Provide example inputs and outputs5. Annotate6. Make it executable from outside the local environment7. Choose services carefully8. Reuse existing workflows9. Advertise10. Maintain 30
    31. 31. Wf4Ever toolingSneak preview 31
    32. 32. Supporting information Workflow jargon› Scientific workflow Paradigm to describe, manage, and share complex scientific analyses› Workflow system Software to design, execute, and monitor scientific workflows› Module = nested workflow = workflow in a workflow = workflow component› Beanshell script A Java-based scripting language. Typically used for data type conversions in Taverna.› Provenance History or trace of a workflow run. Allows you to look at intermediate data, which workflows and services were run, with what data. 32

    ×