Taverna tutorial

Rafael Jimenez
rafael@ebi.ac.uk
TavernaWorkflow Management System
Proteomics Bioinformatics
Wellcome Trust Genome Campus
Hinxton, Cambridge, UK
15-19 July 2011

Exercises
• Search workflows using PICR in MyExperiment
• Look for workflows prepared for this workshop
• Open a workflow from MyExperiment and run it in
Taverna
• Try more workflows from MyExperiment
• Build a workflow of nested workflows
• Running a workflow several times
• Running workflows from the command line

EXERCISE 1
Search workflows using PICR in MyExperiment

• Let’s find workflows using PICR (The Protein Identifier Cross-
Reference Service)
– Open Taverna
– Click on the “myExperiment” button
– Click on “Search” tab
– Search for “EBI PICR”
1. How many workflows were built using Taverna 2?
2. Which workflow includes other nested workflows?
3. Which workflow could you use to find UniProt cross-references for
an Ensembl protein accession?

EXERCISE 2
Look for workflows prepared for this workshop

• We grouped some interested workflows for this workshop in a
“pack”. Let’s have a look.
– Open Taverna
– Click on the “myExperiment” button
– Click on “Search” tab
– Just tick the “packs” checkbox
– Search for “proteomics bioinformatics”
– Click on the “Preview” button to see the selected workflows
available in the “Proteomics Bioinformatics Workshop 2011”
pack
– Click on one workflow to see more information
1. How many workflows are in the pack about DAS?
2. What is DAS?

EXERCISE 3
Open a workflow from MyExperiment and run it in Taverna

• Let’s open and run one of these workflows in Taverna. For instance
let’s find Ensembl and SwissProt cross-references for the protein
accession “TGFR2_HUMAN”
– In the “Proteomics Bioinformatics Workshop 2011” pack find a
click on the “EBI PICR” workflow
– Read the description to get familiar with the workflow
– Click on the “Open” button
– Have a look to the graphical representation of the workflow
• What it is important now is to recognize there are 4 inputs for this workflow
(protein accession, active accessions, taxonomy identification and
databases) and 1 output (a table with mapping results)
– In the menu click on “File” and “Run workflow”
• Set a value for each input (Read the content of “Port description” and
“Example value” for some advice). For the “searchDatabases” input you will
have to set two values:
– onlyActive: true
– Protein_accessions: TGFR2_HUMAN
– searchDatabases: SWISSPROT and ENSEMBL
– taxonId: 9606
• Click on the “Run workflow” button
– Run the workflow and check the results in the “mapping_table”
tab selecting “Value 1”

EXERCISE 4
Try more workflows from MyExperiment

• Following “Exercise 3” open and run other workflows from the
“Proteomics Bioinformatics Workshop 2011” pack that could be
interesting for you.

EXERCISE 5
Build a workflow of nested workflows
Reactome & Biomodels

• Let’s put together two workflows. For instance let’s find Biological
Models from Biomodels and Pathways from Reactome using an
Gene Ontology term as a query.
– Create a new workflow (Menu -> File/New workflow)
– From the list of workflows, available in the “Proteomics
Bioinformatics Workshop 2011” pack, click on “Find Biological
Models by GO”.
• Click on the “Import” button (as a nested worflow)
– From the list of workflows click on “Find Reactome pathways
and reactions by GO”.
– Click on the “Design” tab to see the nested workflows
– Create a new input
• In the menu: Insert/Input Workflow port
– Name: goAccession
– Type: Single value
– Click on the input box and drag towards “GoId” (Biomodels
input) and let go. An arrow will connect the two boxes.
– Click on the input box and drag towards “Go_accession”
(Reactome input) and let go. An arrow will connect the two

– Create a new output to store Biomodels names
– Name: biomodels
– Create a new output to store Reaction names
– Name: reactomeReactions
– Create a new output to store Pathway names
– Name: reactomePathways
– Connect biomodels with MoldelsNames
– Connect reactomeReactions with reaction_name
– Connect reactomePathways with pathway_name
– Try the workflow using a GO accession as input: “0006915”
– Run the workflow and check the results (values for biomodels,
reactomePathways and reactomeReactions)
1. How many models were found?
2. How many reactions?
3. How many pathways?

EXERCISE 6
Running a workflow several times
BLASTing multiple protein sequences

• We might find a workflow we would like to run several times. This
example will show you how to run Blast using a list of sequences.
– Find a workflow to make Blast searches
• Create a new workflow (Menu -> File/New workflow)
• Click on the “myExperiment” button
• Click on “Search” tab
• Search for “EBI NCBI BLAST”
• Find the workflow from “Katy Wolstencroft” for “Taverna2”
– Find a workflow to convert an input with multiple FASTA
sequences in an list of FASTA sequences
• Click on the “myExperiment” button
• Click on “Search” tab
• Search for “split fasta sequence”
• Find the “Fasta_string_to_fasta_list” workflow from “Hamish McWilliam” for
“Taverna1”
– Connect the “fasta_list” output with the “sequence” input
• Click on the “fasta_list” output box and drag towards the “sequence” input
box and let go. An arrow will connect the two boxes.

– Create a new input
– Name: email
– Click on the new “email” input box and drag towards “email”
(EBI_NCBI_BLAST input) and let go. An arrow will connect the
two boxes.
– Create a new output for each “EBI_NCBI_BLAST” output
– Name: chose a name
– Try the workflow using your mail as input and a couple of protein
sequences in FASTA format
– Example of two sequences in FASTA format:
>protein1
CIMKEKKKPGMCSYSETFFMCSCSSDECNDNIIFSEEYNTSNPDLLLVIFQVTGISLLPPSVIIIFYCYRVNRQQKLSSTWETG
KTRKLMEFSEHCAMCSYSIILEDDRSDISSTCANNI
>protein2
SKANGEANRLLTNARITDSSIWSPQPGQHISIQMCSYSTYRELNPAPTSSPTSTRTEIQLNGENSRSTADLPMIHM
– Run the workflow and check the results
– If you want to make a blast search for more than 10 sequences I would
recommend you to run the workflow using the command line tool.

EXERCISE 7
Running workflows from the command line
BLASTing multiple protein sequences

Sometimes we might have memory problems Taverna. For instance if
we send many blast queries Taverna will straggle to get all the results
in memory. One way to avoid this problem is to run the workflow we
have created through the command line and store the results in files.
Following the Blast exercise if you want to make a blast search for
more than 10 sequences I would recommend you to run the workflow
using the command line.
•Copy your workflow and data in the Taverna folder
– Copy your workflow in the Taverna folder
• C:Program FilesTaverna Workbench 2.3.0
– Create a file (yourfilename.txt) with your FASTA sequences.
– Save the file in the Taverna folder
• C:Program FilesTaverna Workbench 2.3.0

• Run the workflow from the command line
– Open the command line
• Start / Run
• In the open field type “cmd” and click on “OK”
– Get into the Taverna directory
• If the last line you do not see “C:>” type “c:” and click Enter.
• Get into the Taverna folder
– Type “cd C:Program FilesTaverna Workbench 2.3.0”
– Run the workflow
• Type “executeworkflow.bat -embedded -inputvalue email your@email.com
-inputfile yourfilename.txt EBI_NCBI_BLAST-multifasta.t2flow
– More docuemtnation about how to use the command line tool ...
http://www.taverna.org.uk/documentation/taverna-2-x/command-line-tool/2-3/

ProteomicsServicesTeam
Acknowledgements & thanks
All the myGrid team
Specially to …Katy Wolstencroft, Shoaib
Sufi, Peter Li, Eric Nzuobontane

Taverna tutorial

Recommended

Recommended

More Related Content

Similar to Taverna tutorial

Similar to Taverna tutorial (20)

More from Rafael C. Jimenez

More from Rafael C. Jimenez (20)

Recently uploaded

Recently uploaded (20)

Taverna tutorial