2. Exercise 1: Installing the Workbench
⢠Taverna can be downloaded from
http://www.mygrid.org.uk/
⢠Go to the page and click on download Taverna 2.1
⢠Download the correct version for your operating system
⢠Follow the instructions in the Taverna installer
Katy Wolstencroft, myGrid, University of Manchester
3. 1: Installing on Windows
⢠If you have administrator rights, you can download the
Windows installer. If not, you can install the Windows
archive
⢠Follow the instructions in the Taverna installer
⢠Once installed, run Taverna from the new directory
(there will be a shortcut in the start menu as well)
Katy Wolstencroft, myGrid, University of Manchester
4. 1: Installing on MAC and Linux
⢠Download and install the OS X disk image. A finder
window will appear when you open the disk image, then
you can drag Taverna into your Applications folder
⢠Download the Linux archive and unpack it using tar zxfv
taverna-workbench-2.1.b2.tar.gz or by double-clicking it
in your desktop environment. Make sure you have SUN
Java 1.6 JRE and Graphviz installed
Katy Wolstencroft, myGrid, University of Manchester
6. 1. Workflow Explorer
⢠The Workflow Explorer is the primary editing
component within Taverna. Through it you can
load, save and edit any property of a workflow.
⢠The workflow explorer is also where you find
configuration details of services and advanced
options like iteration and looping. We will come
back to these things later
Katy Wolstencroft, myGrid, University of Manchester
7. 1. Workflow Diagram
The visual representation of workflow
⢠Shows inputs / outputs, services and control
flows
⢠Allows editing of the workflow by dragging and
dropping and connecting services together
⢠Enables saving of workflow diagrams for
publishing and sharing
Katy Wolstencroft, myGrid, University of Manchester
8. 1. Available Services Panel
Lists services available by default in Taverna
⢠~ 3500 services
â Local java services
â Simple web services
â Soaplab services â legacy command-line application
â R Processor
â BioMart database services
â BioMoby services
â Beanshell processor
Allows the user to add new services or workflows
from the web or from file systems
Katy Wolstencroft, myGrid, University of Manchester
9. New services can be gathered from anywhere on the web
the default list are just a few we already know about
importing others is very straightforward.
The myGrid project in Manchester and the EBI have
recently initiated the BioCatalogue project.
The BioCatalogue is a public curated catalogue of Life
Science web services
Exercise 2: Adding New Services
Katy Wolstencroft, myGrid, University of Manchester
10. Go to: http://www.biocatalogue.org and explore
Through the BioCatalogue you can find, register, or
annotate web services
2: Adding New Services
Katy Wolstencroft, myGrid, University of Manchester
11. ďą Type âblastâ into the Search box in the BioCatalogue
ďą Select the Blast service from DDBJ
There it is!
2. Adding New Services
Katy Wolstencroft, myGrid, University of Manchester
12. ďąClicking on the blast service brings
you to the page describing the
service and its operations
ďąCopy the service WSDL location
This is what Taverna needsâŚ
2. Adding New Services
Katy Wolstencroft, myGrid, University of Manchester
13. 2. Adding New Services
⢠Go to the services panel in
Taverna and click âimport
new servicesâ. For each
type of service, you are
given the option to add a
new service
⢠Select âWSDL serviceâŚâ A
window will pop-up asking
for a web address
Katy Wolstencroft, myGrid, University of Manchester
14. 2. Adding New Services
⢠Enter the Blast Web
service address you just
copied
⢠Scroll down to the bottom
of the Services list and
look at the new DDBJ
service that is now
included.
Katy Wolstencroft, myGrid, University of Manchester
15. Go to the Services Panel
⢠Type âFastaâ into the âsearchâ box at the top of the panel
⢠You will see several services in the search results
ď¨ Select âGet Protein FASTAâ. This service returns a protein
sequence in Fasta format from a database if you supply it
with a sequence id
Exercise 3: Building a Simple Workflow
ď¨
Drag this service
across to the workflow
explorer panel
Katy Wolstencroft, myGrid, University of Manchester
16. Exercise 3: Building a Simple Workflow
⢠In a blank space in the workflow
diagram, right-click and select âAdd
Workflow Input Portâ
⢠Type in a name for this input (e.g. ID)
and click âokâ
⢠Do the same to create a new workflow
output. Call this output âsequenceâ
Katy Wolstencroft, myGrid, University of Manchester
17. Exercise 3: Building a Simple Workflow
⢠You now have 3 boxes in the diagram and we need to
connect them up
⢠Click on the input box and drag towards âGet Protein
Fastaâ and let go. An arrow will connect the two boxes
Katy Wolstencroft, myGrid, University of Manchester
18. Exercise 3: Building a Simple Workflow
⢠Click on the output box, drag
towards âGet protein fastaâ,
and let go. An arrow will
connect the two boxes
⢠You have now built your first
workflow!
⢠It should look something like
this
Katy Wolstencroft, myGrid, University of Manchester
19. Exercise 3: Building a Simple Workflow
⢠Run the workflow by selecting âfile -> run workflowâ, or by
clicking on the play button at the top of the workbench
Katy Wolstencroft, myGrid, University of Manchester
20. Exercise 3: Building a Simple Workflow
An input window will appear. As you can see, we have not yet
added a description of the workflow or of the input
Click on âNew Valueâ in the input window and add a Genbank Gene
identifier (e.g. 1220173) where it says âsome input data goes hereâ
Katy Wolstencroft, myGrid, University of Manchester
21. Exercise 3: Building a Simple Workflow
⢠Click ârun workflowâ
⢠In the bottom left of the results window, click on the results.
You will now see a protein sequence from genbank
⢠In the services panel, search for âblastâ
⢠Find the result âSearchSimple â Execute Blastâ and drag
that across to the workflow panel (this is the service we
added at the beginning)
Katy Wolstencroft, myGrid, University of Manchester
22. Exercise 3: Building a Simple Workflow
⢠Now we have 2 services to connect into a workflow. We will
connect âGet_protein_fastaâ to âSearchSimpleâ by right-
clicking âGet_protein_fastaâ and selecting âlink from output
output_textâ
You will get an arrow. Drag
the arrow to âsearchSimpleâ.
A box will appear asking
which port you want to
connect to â select âqueryâ.
Now the services are
connected
Katy Wolstencroft, myGrid, University of Manchester
23. 3: Building a Simple Workflow
⢠If you show the service ports, you
can connect directly between an
output port on one service to an
input port on another
⢠Show the service ports by clicking
on the blue square icon at the top of
the workflow diagram (next to abc)
Katy Wolstencroft, myGrid, University of Manchester
24. Exercise 3: Building a Simple Workflow
⢠We need to finish building the workflow by adding inputs
and outputs
⢠Right click on âSearchSimple -> Resultâ and select
âconnect as input to..New Workflow Output Portâ
Katy Wolstencroft, myGrid, University of Manchester
25. Exercise 3: Building a Simple Workflow
⢠Taverna will suggest a name for the output, if this is ok,
select âokâ
⢠Add two new workflow inputs (called âdatabaseâ and
âprogramâ) and connect these to âdatabaseâ and
âprogramâ in SearchSimple
Katy Wolstencroft, myGrid, University of Manchester
26. 3: The Finished Workflow!
⢠Your workflow should look
something like this
Katy Wolstencroft, myGrid, University of Manchester
27. 3: Adding a Workflow Description
⢠Right-click on a blank part of the workflow diagram
and select âshow detailsâ
⢠In the workflow explorer panel, the details page
will open up. Add some details about the workflow
e.g. who is the author, what does it do
⢠You can also add examples and descriptions for
the workflow inputs by selecting them and
selecting âdetailsâ
⢠An example for database is âSWISSâ, for program,
âblastpâ, and for ID â1220173â
⢠Save the workflow by going to âFile -> save
workflowâ
Katy Wolstencroft, myGrid, University of Manchester
28. 4. Running the Workflow
⢠Go to âFile -> run workflowâ. A workflow input window will
appear like before
⢠This time, each input has its own tab with descriptions and
examples as well as a panel to enter data
⢠In the fasta_id input, select âNew valueâ and add a
genbank GI number (e.g. 1220173)
⢠In the database, add âSWISSâ
⢠In the program, add âblastpâ
⢠Select ârun workflowâ at the bottom of the panel to set the
workflow going
Katy Wolstencroft, myGrid, University of Manchester
29. 4. Running the Workflow with Multiple Inputs
⢠Taverna 2 has type-checking built into the workflow. Before
you execute, it will check that all of your input and output
values are syntactically correct (i.e. single values and lists).
In the following few months semantic type checking will
also be added.
⢠Because of this, you have to declare the type of input you
want for the workflow (we have declared single values by
default)
Katy Wolstencroft, myGrid, University of Manchester
30. 4. Running the Workflow with multiple inputs
⢠Go back to the blast workflow and right-click on the
âGet_protein_fatsta_IDâ input port. Select âedit workflow
input portâ
⢠Change the depth to 1. This will allow you to add a list of
inputs to the workflow
Katy Wolstencroft, myGrid, University of Manchester
31. 4. Running the Workflow with multiple inputs
⢠Run the workflow again (notice it has remembered the
values you added last time). Additionally, add another GI
number, for example, 37722019
This time the
workflow will iterate
over both
Katy Wolstencroft, myGrid, University of Manchester
32. 5. Looking at intermediate results - provenance
⢠As Taverna 2 workflows run, the provenance of that
workflow is recorded
⢠When a workflow is complete, you can look at
intermediate results by selecting a service in the
workflow results diagram panel
Katy Wolstencroft, myGrid, University of Manchester
33. 5. Looking at intermediate results - provenance
⢠An intermediate results window will pop-up showing
iterations and the relationships between inputs and
outputs for that service
34. Exercise 6: Sharing Workflows
⢠Go to http://www.myexperiment.org
⢠myExperiment is a social networking site for sharing
workflows and workflow expertise and experiences
⢠Browse around the site and see what it contains
⢠Create yourself an account and (if you wish) join the
group called Tutorial (a useful place to share items from
today)
Katy Wolstencroft, myGrid, University of Manchester
35. 6. Sharing workflows
⢠Find all the workflows containing BLAST searches. How
did you find them? How many are there? Can they all be
downloaded?
⢠Which is the most downloaded workflow?
⢠Which is the most viewed workflow? Is it the same?
⢠Explore the workflows packs â how many packs feature
workflows for microarray analysis?
Katy Wolstencroft, myGrid, University of Manchester
36. Exercise 7:
Workflow Reuse and Nested Workflows
⢠Reload your BLAST workflow from exercise 3
⢠We will extend this workflow to provide a protein domain
and motifs search.
⢠In myExperiment, find all the workflow that perform
InterproScan searches
⢠Select and download the EBI_InterproScan (T2 Looping)
workflow â it is workflow number 820 and it was
uploaded by Stian Soiland-Reyes
Katy Wolstencroft, myGrid, University of Manchester
37. 7. Workflow Reuse â Nested Workflows
⢠Go back to Taverna and look at the Blast workflow
⢠Add a nested workflow by clicking on the blank part of
the diagram and selecting âAdd Nested Workflowâ and
selecting the workflow you have just downloaded
⢠You need to connect up the workflow as if it was any
other kind of service
Katy Wolstencroft, myGrid, University of Manchester
38. 7. Workflow Reuse â Nested Workflows
⢠The nested workflow has 2 inputs and 5 outputs. We
need to connect both inputs, but we can choose which
outputs to connect
⢠In the outer workflow create a new input called
âEmailAddressâ and a new output called
âInterproScan_outâ
⢠Connect EmailAddress to the nested workflow input
âEmail_addressâ
Katy Wolstencroft, myGrid, University of Manchester
39. 7. Workflow Reuse â Nested Workflows
⢠Connect the nested workflow output
âInterproScan_text_resultâ to the new âInterproScan_outâ
output
⢠Connect the output of âGetProteinFastaâ to the nested
workflow input âSequence_or_IDâ
Katy Wolstencroft, myGrid, University of Manchester
40. 7. Workflow Reuse â Nested Workflows
⢠Save the workflow and run it
⢠Look at the results
⢠This time, you will have blast results and InterproScan
results
⢠If you save the workflow back on myExperiment, make
sure you attribute the nested workflow author
Katy Wolstencroft, myGrid, University of Manchester
41. 7. Workflow Reuse â Nested Workflows
The nested workflow we added is an example of using an
asynchronous service. The service first produces an ID
which is then used to poll the service every few seconds
to see if it is finished. We use Tavernaâs Looping
mechanism to do this. We will come back to this later on
Katy Wolstencroft, myGrid, University of Manchester
42. Acknowledgements & thanks
All the myGrid team
Specially
⢠Katy Wolstencroft
⢠Shoaib Sufi
⢠Peter Li
⢠Paul Fisher
⢠Eric Nzuobontane
Editor's Notes
Integration of biological data of various types and development of adapted bioinformatics tools represent critical objectives to enable research at the systems level. The European Network of Excellence ENFIN is engaged in developing an adapted infrastructure to connect databases, and platforms to enable both generation of new bioinformatics tools and experimental validation of computational predictions. Beyond the use of common standards to format individual datasets, there is a need for sophisticated informatics platforms to enable mining data across various domains, sources, formats and types. The aim of the EnCORE project is to integrate across different disciplines an extensive list of database resources and analysis tools in a computationally accessible and extensible manner, facilitating automated data retrieval and processing with a special focus on systems biology. The EnCORE platform is available as a collection of webservices with a common standard format easy to integrate in Workflow management software such as Taverna. Additionally EnCORE services are also accessible thought EnVISION, a web graphical user interface providing elaborated information such as molecular interaction, biological pathways and computational models of pathways.