Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

IMPACT/myGrid Hackathon - Introduction to Taverna


Published on

Katy Wolstencroft gives an introduction to Taverna at the IMPACT/myGrid Taverna Hackathon, 14th November 2011

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

IMPACT/myGrid Hackathon - Introduction to Taverna

  1. 1. An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011
  2. 2. Exercise 1: Exploring the Workbench <ul><li>Taverna can be downloaded from </li></ul><ul><li> </li></ul><ul><li> Go to the page and find the latest (2.3) </li></ul><ul><li>Download the correct version for your operating system </li></ul><ul><li>Follow the instructions in the Taverna installer </li></ul><ul><li>The following page shows a screenshot of Taverna and the different panels that make up the workbench </li></ul>
  3. 3. Taverna Workbench Workflow Diagram Services Panel Workflow Explorer
  4. 4. 1. Workflow Diagram <ul><li>The visual representation of workflow </li></ul><ul><li>Shows inputs/outputs, services and control flows </li></ul><ul><li>Allows editing of the workflow by dragging and dropping and connecting services together </li></ul><ul><li>Enables saving of workflow diagrams for publishing and sharing </li></ul>
  5. 5. 1. Workflow Explorer <ul><li>The Workflow Explorer shows the detailed view of your workflow. It shows default values and descriptions for service inputs and outputs and it shows where remote services are located. It also shows configuration details, such as iteration and looping (we will come back to these things later). </li></ul><ul><li>Workflow validation details can also be found here. Before a workflow is run, Taverna checks to see if it is connected correctly and if its services are available. </li></ul>
  6. 6. 1. Available Services Panel <ul><li>Lists services available by default in Taverna </li></ul><ul><ul><li>Local java services </li></ul></ul><ul><ul><li>WSDL Web Service – secure and public </li></ul></ul><ul><ul><li>RESTful Services </li></ul></ul><ul><ul><li>R Processor services (for statistical analyses) </li></ul></ul><ul><ul><li>Beanshell scripts </li></ul></ul><ul><ul><li>Xpath scripts </li></ul></ul><ul><li>Allows the user to add new services or workflows from the web or from file systems – there are loads more available! </li></ul>
  7. 7. <ul><li>In the Services panel, type ‘ image’ into the search box. </li></ul><ul><li>Select ‘Get Image from URL’ </li></ul><ul><li>This is a local service, but web services work the same way </li></ul><ul><li>Many historical documents are stored as images on the web. This is a simple, but useful service to help gather data </li></ul>Exercise 2: Building a Simple Workflow <ul><li>Drag this service across to the workflow diagram panel </li></ul>
  8. 8. Exercise 2: Building a Simple Workflow <ul><li>In a blank space in the workflow diagram, right-click and select “Add Workflow Input Port” </li></ul><ul><li>Type a name (e.g. URL) for this input in the pop-up window and click “ok” </li></ul><ul><li>Do the same to create a new workflow output. Call this output “image” </li></ul>
  9. 9. Exercise 2: Building a Simple Workflow <ul><li>You now have 3 boxes in the diagram and we need to connect them up into a workflow </li></ul><ul><li>First, we need to find out how many inputs and outputs the ‘get image from URL’ service has </li></ul><ul><li>At the top of the workflow diagram, select the ‘show ports’ icon </li></ul>Show Ports
  10. 10. Exercise 2: Building a Simple Workflow <ul><li>Click on the workflow input box and drag the linking arrow across to the URL input of the ‘get_image_from_URL’ service. </li></ul>Link the image output of ‘get_image_from_url’ to the workflow output port
  11. 11. Exercise 2: Building a Simple Workflow <ul><li>You have now built your first workflow! It should look something like this. </li></ul><ul><li>In many cases, you have to supply input data for EVERY service input port. In this case, however, the ‘base’ input is optional, so we will leave it. </li></ul><ul><li>Save the workflow by going to file -> save workflow </li></ul>
  12. 12. Exercise 2: Building a Simple Workflow <ul><li>Run the workflow by selecting “file -> run workflow”, or by clicking on the play button at the top of the workbench </li></ul>
  13. 13. Exercise 2: Building a Simple Workflow <ul><li>An input window will appear. As you can see, we have not yet added a description of the workflow or of the input </li></ul>Click on ‘New Value’ in the input window and add the url where it says “some input data goes here”
  14. 14. Exercise 2: Building a Simple Workflow <ul><li>Click “run workflow” </li></ul><ul><li>In the bottom left of the results window, click on the results. You will now see an image from the specified web page </li></ul><ul><li>Workflow results can be saved here if required by clicking on ‘save all values’ </li></ul>
  15. 15. 2: Adding a Workflow Description <ul><li>Right-click on a blank part of the workflow diagram and select “show details” </li></ul><ul><li>In the workflow explorer panel, the details page will open up. Add some details about the workflow (e.g. who is the author, what does the workflow do). </li></ul><ul><li>You can also add examples and descriptions for the workflow inputs by selecting them in the explorer panel and selecting “details” </li></ul><ul><li>Adding this metadata makes the workflow much more reusable </li></ul><ul><li>Save the workflow by going to “File -> save workflow” </li></ul>
  16. 16. <ul><li>New services can be gathered from anywhere on the web </li></ul><ul><li>We will find a new service and add it to the workbench </li></ul><ul><li>IMPACT and SACPE have a whole suite of services. We will add one (you will be using it later on today) </li></ul><ul><li>Go to . Here you will find a list of IMPACT services </li></ul><ul><li>Click on IMPACTTesseractV3Proxy and copy the link you are directed to. </li></ul><ul><li>This is the WSDL address and is what Taverna needs to run the service </li></ul>Exercise 3: Adding New Services
  17. 17. 3. Adding New Services <ul><li>Go to the services panel in Taverna and click “import new services”. For each type of service, you are given the option to add a new service </li></ul><ul><li>Select ‘ WSDL service…’ A window will pop-up asking for a web address </li></ul>
  18. 18. 3. Adding New Services <ul><li>Enter the service address you just copied </li></ul><ul><li>Scroll down the Services list, you will see your new service there </li></ul>
  19. 19. Exercise 4: Sharing and Reusing Workflows <ul><li>Go to </li></ul><ul><li>myExperiment is a social networking site for sharing workflows and workflow expertise and experiences </li></ul><ul><li>Browse around the site and see what it contains </li></ul><ul><li>Find everything that has been tagged with ‘text mining’, for example </li></ul><ul><li>Look at the text mining workflows. You will see some that are specific to biology, some that are generally applicable, and some that are specific to other scientific disciplines </li></ul>
  20. 20. 4. Sharing and Reusing workflows <ul><li>IMPACT have many workflows on myExperiment, but they are not public. You must join an IMPACT group before you can see them and use them. </li></ul><ul><li>Create yourself an account and join the group called ‘IMPACT-myGrid-Hackathon’ ( NOTE : you need to join this group to access content for future exercises) </li></ul><ul><li>Explore the shared items in this group. These are examples of the types of tasks IMPACT workflows can perform </li></ul>
  21. 21. 5. Using Workflows from myExperiment <ul><li>You can download and run the workflows from the myExperiment website, or you can use myExperiment directly from Taverna </li></ul><ul><li>To use workflows from the website, you can either download them, or copy the workflow file location into the ‘open workflows from the web’ option in Taverna’s file menu. </li></ul>
  22. 22. 5. Using Workflows from myExperiment <ul><li>Go back to Taverna and click on the myExperiment icon at the top of the workbench </li></ul><ul><li>Go to ‘my stuff’ and log in (using the same credentials as the web page) </li></ul><ul><li>Find the IMPACT-myGrid-Hackathon group by using the ‘search’ option. </li></ul><ul><li>Look at the shared items and find the workflow called ‘Text to List’ </li></ul><ul><li>Click on ‘open’ and this workflow will be automatically imported into your Taverna design window </li></ul>
  23. 23. 5. Validate your Workflow <ul><li>Taverna checks to see that everything is connected properly and that all the required services are available </li></ul><ul><li>Go to the workflow explorer and click on ‘validation report’ </li></ul><ul><li>See if Taverna has found any problems with the workflow. Errors will be displayed in red, warnings in yellow. Workflows with warnings often still run. </li></ul><ul><li>If there are problems, follow the instructions to resolve them by clicking on the ‘Solution’ tab </li></ul><ul><li>If not, run the workflow </li></ul>
  24. 24. 5. Using Workflows from myExperiment <ul><li>Use the default input suggested to run the workflow. The workflow will collect and list some example data stored at the given URL </li></ul><ul><li>It returns a list of image files </li></ul><ul><li>We can now combine this workflow with the one we made earlier to return the actual images. </li></ul><ul><li>In Taverna, you can add workflows as if they were any other kind of service – these are called ‘Nested Workflows’ </li></ul>
  25. 25. 6. Reusing and connecting Workflows <ul><li>From the current workflow design window, go to </li></ul><ul><li>‘ Insert -> Nested workflow </li></ul><ul><li>Import the workflow you made earlier, by selecting ‘import from file’ </li></ul><ul><li>You can see a small version of the workflows, so you can check you are importing the correct workflow </li></ul>
  26. 26. 6. Reusing and connecting Workflows <ul><li>We now need to connect the two workflows together </li></ul><ul><li>Connect the Text2List service to the input of the nested workflow by dragging an arrow across. </li></ul><ul><li>Make a new workflow output port (by right-clicking and adding workflow output port) </li></ul><ul><li>Connect the output of the nested workflow to the new workflow output port </li></ul>
  27. 27. 6. Reusing and connecting Workflows <ul><li>Your new workflow should look something like this </li></ul><ul><li>Save and run the workflow </li></ul><ul><li>This time, as it runs, you will see Taverna automatically iterates over the list of data produced by Text2List </li></ul><ul><li>NOTE: some of the iterations will fail. See if you can tell which </li></ul><ul><li>Look at one of the resulting images </li></ul>
  28. 28. 7. Looking at Intermediate Results <ul><li>You can track intermediate workflow values through the results view. This is very useful for working out where unexpected results came from. </li></ul><ul><li>On the diagram, click the Text2List service and look at its inputs and outputs in the results. </li></ul><ul><li>You can save the workflow in myExperiment if you wish, but make sure you give credit to the nested workflow author and make sure you ONLY share it with the IMPACT-myGrid-Hackathon group </li></ul>
  29. 29. Controlling data flow in Workflows Advanced Exercises
  30. 30. <ul><li>As you have already seen, Taverna can automatically iterate over sets of data. </li></ul><ul><li>When 2 sets of iterated data are combined, however, Taverna needs extra information about how they should be combined. You can have: </li></ul><ul><li>A cross product – combining every item from list 1 with every item from list 2 - all against all </li></ul><ul><li>A dot product – only combining item 1 from list 1 with item 1 from list 2, and so on – line against line </li></ul>8. Iteration
  31. 31. <ul><li>Find and load the workflow ‘ Demonstration of configurable iteration ’ from myExperiment </li></ul><ul><li>Read the workflow metadata to find out what the workflow does (by looking at the ‘Details’) </li></ul><ul><li>Select the ‘ ColourAnimals ’ service and select the ‘Details’ in the workflow explorer and ‘configure list handling’ </li></ul><ul><li>Click on ‘dot product’ in the pop-up window . This allows you to switch to cross product </li></ul>8. Iteration
  32. 32. <ul><li>Run the workflow twice – once with ‘ dot product ’ and once with ‘ cross product ’. </li></ul><ul><li>Save the first results so you can compare them – what is the difference? What does it mean to specify dot or cross product? </li></ul>8. Iteration
  33. 33. 9. Retries: Making your Workflow Robust <ul><li>Web services can sometimes fail due to network connectivity </li></ul><ul><li>If you are iterating over lots of data items, you can guard against these temporary interruptions by adding retries to your workflow </li></ul><ul><li>Upload the ‘Retry-Example’ workflow from the IMPACT-myGrid-Hackathon group. This workflow is designed to fail sometimes. </li></ul><ul><li>Run the workflow as it is and count the number of failed iterations </li></ul>
  34. 34. 9. Retries: Making your Workflow Robust <ul><li>Now, select the ‘sometimes_fails’ service and select the ‘details’ tab in the workflow explorer panel </li></ul><ul><li>Click on ‘advanced’ and ‘configure’ for retries </li></ul><ul><li>In the pop-up box, change it so that it retries each service iteration 2 times </li></ul><ul><li>Run the workflow again – how many failures do you get this time? </li></ul><ul><li>Change the workflow to retry 5 times – does it work every time now? </li></ul>
  35. 35. 10. Looping <ul><li>From myExperiment, download and open the workflow “dummy_example_of_looping” </li></ul><ul><li>This workflow is asynchronous. This means that when you submit data (by running the workflows), it will return a jobID and place your job in a queue. This is very useful if your job will take a long time! </li></ul><ul><li>The ‘CheckStatus’ service will query your job ID to find out if it is complete </li></ul>
  36. 36. 10. Looping <ul><li>The default behaviour in a workflow is to call each service only once for each item of data – so what if your job has not finished when ‘Status’ workflow asks? </li></ul><ul><li>Run the workflow </li></ul><ul><li>Almost every time, the workflow will ‘fail’ (in this case, that means it will return 0) because the results have not been returned before the workflow reaches the ‘getResults’ service </li></ul>
  37. 37. 10. Looping <ul><li>This is where looping is useful. Taverna can keep running the ‘status’ service until it reports that the job is done. </li></ul><ul><li>Select the ‘CheckStatus’ service and click on the ‘details’ tab in the workflow explorer </li></ul><ul><li>Select ‘advanced’ and click on ‘add looping’ </li></ul><ul><li>Use the drop-down boxes in the looping window to set ‘state’ ‘is_not_equal_to’ RUNNING </li></ul>
  38. 38. 10. Looping <ul><li>Save the workflow and run it again </li></ul><ul><li>This time, the workflow will run until the ‘CheckStatus’ service reports that it is either COMPLETE, or it has an ERROR. </li></ul><ul><li>You will see results for ‘GetResults’, but you will still get an error for ‘GetResults2’. This is because there is one more configuration to change – we also need ‘Control Links’ </li></ul>
  39. 39. <ul><li>A control link specifies that there is a dependency of one service on another even though there is no data flowing between them. </li></ul><ul><li>A control link is a line with a white circle at the end that connects two services (see the link between ‘CheckStatus’ and ‘getResults’ </li></ul>11. Control Links
  40. 40. 11. Control Links <ul><li>We will add control link to getResults2 </li></ul><ul><li>Right-click on getResult2 and select ‘Run after’ from the drop down menu. </li></ul><ul><li>Set it to ‘Run after’ -> ‘CheckStatus’ </li></ul><ul><li>Save and run the workflow </li></ul><ul><li>Now you will see both results returned </li></ul>