Rafael Jimenez
rafael@ebi.ac.uk
TavernaWorkflow Management System
Proteomics Bioinformatics
Wellcome Trust Genome Campus
Hinxton, Cambridge, UK
15-19 July 2011
Exercises
• Search workflows using PICR in MyExperiment
• Look for workflows prepared for this workshop
• Open a workflow from MyExperiment and run it in
Taverna
• Try more workflows from MyExperiment
• Build a workflow of nested workflows
• Running a workflow several times
• Running workflows from the command line
EXERCISE 1
Search workflows using PICR in MyExperiment
• Let’s find workflows using PICR (The Protein Identifier Cross-
Reference Service)
– Open Taverna
– Click on the “myExperiment” button
– Click on “Search” tab
– Search for “EBI PICR”
1. How many workflows were built using Taverna 2?
2. Which workflow includes other nested workflows?
3. Which workflow could you use to find UniProt cross-references for
an Ensembl protein accession?
EXERCISE 2
Look for workflows prepared for this workshop
• We grouped some interested workflows for this workshop in a
“pack”. Let’s have a look.
– Open Taverna
– Click on the “myExperiment” button
– Click on “Search” tab
– Just tick the “packs” checkbox
– Search for “proteomics bioinformatics”
– Click on the “Preview” button to see the selected workflows
available in the “Proteomics Bioinformatics Workshop 2011”
pack
– Click on one workflow to see more information
1. How many workflows are in the pack about DAS?
2. What is DAS?
EXERCISE 3
Open a workflow from MyExperiment and run it in Taverna
• Let’s open and run one of these workflows in Taverna. For instance
let’s find Ensembl and SwissProt cross-references for the protein
accession “TGFR2_HUMAN”
– In the “Proteomics Bioinformatics Workshop 2011” pack find a
click on the “EBI PICR” workflow
– Read the description to get familiar with the workflow
– Click on the “Open” button
– Have a look to the graphical representation of the workflow
• What it is important now is to recognize there are 4 inputs for this workflow
(protein accession, active accessions, taxonomy identification and
databases) and 1 output (a table with mapping results)
– In the menu click on “File” and “Run workflow”
• Set a value for each input (Read the content of “Port description” and
“Example value” for some advice). For the “searchDatabases” input you will
have to set two values:
– onlyActive: true
– Protein_accessions: TGFR2_HUMAN
– searchDatabases: SWISSPROT and ENSEMBL
– taxonId: 9606
• Click on the “Run workflow” button
– Run the workflow and check the results in the “mapping_table”
tab selecting “Value 1”
EXERCISE 4
Try more workflows from MyExperiment
• Following “Exercise 3” open and run other workflows from the
“Proteomics Bioinformatics Workshop 2011” pack that could be
interesting for you.
EXERCISE 5
Build a workflow of nested workflows
Reactome & Biomodels
• Let’s put together two workflows. For instance let’s find Biological
Models from Biomodels and Pathways from Reactome using an
Gene Ontology term as a query.
– Create a new workflow (Menu -> File/New workflow)
– From the list of workflows, available in the “Proteomics
Bioinformatics Workshop 2011” pack, click on “Find Biological
Models by GO”.
• Click on the “Import” button (as a nested worflow)
– From the list of workflows click on “Find Reactome pathways
and reactions by GO”.
• Click on the “Import” button (as a nested worflow)
– Click on the “Design” tab to see the nested workflows
– Create a new input
• In the menu: Insert/Input Workflow port
– Name: goAccession
– Type: Single value
– Click on the input box and drag towards “GoId” (Biomodels
input) and let go. An arrow will connect the two boxes.
– Click on the input box and drag towards “Go_accession”
(Reactome input) and let go. An arrow will connect the two
– Create a new output to store Biomodels names
• In the menu: Insert/Input Workflow port
– Name: biomodels
– Create a new output to store Reaction names
• In the menu: Insert/Input Workflow port
– Name: reactomeReactions
– Create a new output to store Pathway names
• In the menu: Insert/Input Workflow port
– Name: reactomePathways
– Connect biomodels with MoldelsNames
– Connect reactomeReactions with reaction_name
– Connect reactomePathways with pathway_name
– In the menu click on “File” and “Run workflow”
– Try the workflow using a GO accession as input: “0006915”
– Run the workflow and check the results (values for biomodels,
reactomePathways and reactomeReactions)
1. How many models were found?
2. How many reactions?
3. How many pathways?
EXERCISE 6
Running a workflow several times
BLASTing multiple protein sequences
• We might find a workflow we would like to run several times. This
example will show you how to run Blast using a list of sequences.
– Find a workflow to make Blast searches
• Create a new workflow (Menu -> File/New workflow)
• Click on the “myExperiment” button
• Click on “Search” tab
• Search for “EBI NCBI BLAST”
• Find the workflow from “Katy Wolstencroft” for “Taverna2”
• Click on the “Import” button (as a nested worflow)
– Find a workflow to convert an input with multiple FASTA
sequences in an list of FASTA sequences
• Click on the “myExperiment” button
• Click on “Search” tab
• Search for “split fasta sequence”
• Find the “Fasta_string_to_fasta_list” workflow from “Hamish McWilliam” for
“Taverna1”
• Click on the “Import” button (as a nested worflow)
– Connect the “fasta_list” output with the “sequence” input
• Click on the “fasta_list” output box and drag towards the “sequence” input
box and let go. An arrow will connect the two boxes.
– Create a new input
• In the menu: Insert/Input Workflow port
– Name: email
– Type: Single value
– Click on the new “email” input box and drag towards “email”
(EBI_NCBI_BLAST input) and let go. An arrow will connect the
two boxes.
– Create a new output for each “EBI_NCBI_BLAST” output
• In the menu: Insert/Input Workflow port
– Name: chose a name
– Type: Single value
– In the menu click on “File” and “Run workflow”
– Try the workflow using your mail as input and a couple of protein
sequences in FASTA format
– Example of two sequences in FASTA format:
>protein1
CIMKEKKKPGMCSYSETFFMCSCSSDECNDNIIFSEEYNTSNPDLLLVIFQVTGISLLPPSVIIIFYCYRVNRQQKLSSTWETG
KTRKLMEFSEHCAMCSYSIILEDDRSDISSTCANNI
>protein2
SKANGEANRLLTNARITDSSIWSPQPGQHISIQMCSYSTYRELNPAPTSSPTSTRTEIQLNGENSRSTADLPMIHM
– Run the workflow and check the results
– If you want to make a blast search for more than 10 sequences I would
recommend you to run the workflow using the command line tool.
EXERCISE 7
Running workflows from the command line
BLASTing multiple protein sequences
Sometimes we might have memory problems Taverna. For instance if
we send many blast queries Taverna will straggle to get all the results
in memory. One way to avoid this problem is to run the workflow we
have created through the command line and store the results in files.
Following the Blast exercise if you want to make a blast search for
more than 10 sequences I would recommend you to run the workflow
using the command line.
•Copy your workflow and data in the Taverna folder
– Copy your workflow in the Taverna folder
• C:Program FilesTaverna Workbench 2.3.0
– Create a file (yourfilename.txt) with your FASTA sequences.
– Save the file in the Taverna folder
• C:Program FilesTaverna Workbench 2.3.0
• Run the workflow from the command line
– Open the command line
• Start / Run
• In the open field type “cmd” and click on “OK”
– Get into the Taverna directory
• If the last line you do not see “C:>” type “c:” and click Enter.
• Get into the Taverna folder
– Type “cd C:Program FilesTaverna Workbench 2.3.0”
– Run the workflow
• Type “executeworkflow.bat -embedded -inputvalue email your@email.com
-inputfile yourfilename.txt EBI_NCBI_BLAST-multifasta.t2flow
– More docuemtnation about how to use the command line tool ...
http://www.taverna.org.uk/documentation/taverna-2-x/command-line-tool/2-3/
ProteomicsServicesTeam
Acknowledgements & thanks
All the myGrid team
Specially to …Katy Wolstencroft, Shoaib
Sufi, Peter Li, Eric Nzuobontane

Taverna tutorial

  • 1.
    Rafael Jimenez rafael@ebi.ac.uk TavernaWorkflow ManagementSystem Proteomics Bioinformatics Wellcome Trust Genome Campus Hinxton, Cambridge, UK 15-19 July 2011
  • 2.
    Exercises • Search workflowsusing PICR in MyExperiment • Look for workflows prepared for this workshop • Open a workflow from MyExperiment and run it in Taverna • Try more workflows from MyExperiment • Build a workflow of nested workflows • Running a workflow several times • Running workflows from the command line
  • 3.
    EXERCISE 1 Search workflowsusing PICR in MyExperiment
  • 4.
    • Let’s findworkflows using PICR (The Protein Identifier Cross- Reference Service) – Open Taverna – Click on the “myExperiment” button – Click on “Search” tab – Search for “EBI PICR” 1. How many workflows were built using Taverna 2? 2. Which workflow includes other nested workflows? 3. Which workflow could you use to find UniProt cross-references for an Ensembl protein accession?
  • 6.
    EXERCISE 2 Look forworkflows prepared for this workshop
  • 7.
    • We groupedsome interested workflows for this workshop in a “pack”. Let’s have a look. – Open Taverna – Click on the “myExperiment” button – Click on “Search” tab – Just tick the “packs” checkbox – Search for “proteomics bioinformatics” – Click on the “Preview” button to see the selected workflows available in the “Proteomics Bioinformatics Workshop 2011” pack – Click on one workflow to see more information 1. How many workflows are in the pack about DAS? 2. What is DAS?
  • 9.
    EXERCISE 3 Open aworkflow from MyExperiment and run it in Taverna
  • 10.
    • Let’s openand run one of these workflows in Taverna. For instance let’s find Ensembl and SwissProt cross-references for the protein accession “TGFR2_HUMAN” – In the “Proteomics Bioinformatics Workshop 2011” pack find a click on the “EBI PICR” workflow – Read the description to get familiar with the workflow – Click on the “Open” button – Have a look to the graphical representation of the workflow • What it is important now is to recognize there are 4 inputs for this workflow (protein accession, active accessions, taxonomy identification and databases) and 1 output (a table with mapping results) – In the menu click on “File” and “Run workflow” • Set a value for each input (Read the content of “Port description” and “Example value” for some advice). For the “searchDatabases” input you will have to set two values: – onlyActive: true – Protein_accessions: TGFR2_HUMAN – searchDatabases: SWISSPROT and ENSEMBL – taxonId: 9606 • Click on the “Run workflow” button – Run the workflow and check the results in the “mapping_table” tab selecting “Value 1”
  • 16.
    EXERCISE 4 Try moreworkflows from MyExperiment
  • 17.
    • Following “Exercise3” open and run other workflows from the “Proteomics Bioinformatics Workshop 2011” pack that could be interesting for you.
  • 18.
    EXERCISE 5 Build aworkflow of nested workflows Reactome & Biomodels
  • 19.
    • Let’s puttogether two workflows. For instance let’s find Biological Models from Biomodels and Pathways from Reactome using an Gene Ontology term as a query. – Create a new workflow (Menu -> File/New workflow) – From the list of workflows, available in the “Proteomics Bioinformatics Workshop 2011” pack, click on “Find Biological Models by GO”. • Click on the “Import” button (as a nested worflow) – From the list of workflows click on “Find Reactome pathways and reactions by GO”. • Click on the “Import” button (as a nested worflow) – Click on the “Design” tab to see the nested workflows – Create a new input • In the menu: Insert/Input Workflow port – Name: goAccession – Type: Single value – Click on the input box and drag towards “GoId” (Biomodels input) and let go. An arrow will connect the two boxes. – Click on the input box and drag towards “Go_accession” (Reactome input) and let go. An arrow will connect the two
  • 20.
    – Create anew output to store Biomodels names • In the menu: Insert/Input Workflow port – Name: biomodels – Create a new output to store Reaction names • In the menu: Insert/Input Workflow port – Name: reactomeReactions – Create a new output to store Pathway names • In the menu: Insert/Input Workflow port – Name: reactomePathways – Connect biomodels with MoldelsNames – Connect reactomeReactions with reaction_name – Connect reactomePathways with pathway_name – In the menu click on “File” and “Run workflow” – Try the workflow using a GO accession as input: “0006915” – Run the workflow and check the results (values for biomodels, reactomePathways and reactomeReactions) 1. How many models were found? 2. How many reactions? 3. How many pathways?
  • 25.
    EXERCISE 6 Running aworkflow several times BLASTing multiple protein sequences
  • 26.
    • We mightfind a workflow we would like to run several times. This example will show you how to run Blast using a list of sequences. – Find a workflow to make Blast searches • Create a new workflow (Menu -> File/New workflow) • Click on the “myExperiment” button • Click on “Search” tab • Search for “EBI NCBI BLAST” • Find the workflow from “Katy Wolstencroft” for “Taverna2” • Click on the “Import” button (as a nested worflow) – Find a workflow to convert an input with multiple FASTA sequences in an list of FASTA sequences • Click on the “myExperiment” button • Click on “Search” tab • Search for “split fasta sequence” • Find the “Fasta_string_to_fasta_list” workflow from “Hamish McWilliam” for “Taverna1” • Click on the “Import” button (as a nested worflow) – Connect the “fasta_list” output with the “sequence” input • Click on the “fasta_list” output box and drag towards the “sequence” input box and let go. An arrow will connect the two boxes.
  • 27.
    – Create anew input • In the menu: Insert/Input Workflow port – Name: email – Type: Single value – Click on the new “email” input box and drag towards “email” (EBI_NCBI_BLAST input) and let go. An arrow will connect the two boxes. – Create a new output for each “EBI_NCBI_BLAST” output • In the menu: Insert/Input Workflow port – Name: chose a name – Type: Single value – In the menu click on “File” and “Run workflow” – Try the workflow using your mail as input and a couple of protein sequences in FASTA format – Example of two sequences in FASTA format: >protein1 CIMKEKKKPGMCSYSETFFMCSCSSDECNDNIIFSEEYNTSNPDLLLVIFQVTGISLLPPSVIIIFYCYRVNRQQKLSSTWETG KTRKLMEFSEHCAMCSYSIILEDDRSDISSTCANNI >protein2 SKANGEANRLLTNARITDSSIWSPQPGQHISIQMCSYSTYRELNPAPTSSPTSTRTEIQLNGENSRSTADLPMIHM – Run the workflow and check the results – If you want to make a blast search for more than 10 sequences I would recommend you to run the workflow using the command line tool.
  • 29.
    EXERCISE 7 Running workflowsfrom the command line BLASTing multiple protein sequences
  • 30.
    Sometimes we mighthave memory problems Taverna. For instance if we send many blast queries Taverna will straggle to get all the results in memory. One way to avoid this problem is to run the workflow we have created through the command line and store the results in files. Following the Blast exercise if you want to make a blast search for more than 10 sequences I would recommend you to run the workflow using the command line. •Copy your workflow and data in the Taverna folder – Copy your workflow in the Taverna folder • C:Program FilesTaverna Workbench 2.3.0 – Create a file (yourfilename.txt) with your FASTA sequences. – Save the file in the Taverna folder • C:Program FilesTaverna Workbench 2.3.0
  • 31.
    • Run theworkflow from the command line – Open the command line • Start / Run • In the open field type “cmd” and click on “OK” – Get into the Taverna directory • If the last line you do not see “C:>” type “c:” and click Enter. • Get into the Taverna folder – Type “cd C:Program FilesTaverna Workbench 2.3.0” – Run the workflow • Type “executeworkflow.bat -embedded -inputvalue email your@email.com -inputfile yourfilename.txt EBI_NCBI_BLAST-multifasta.t2flow – More docuemtnation about how to use the command line tool ... http://www.taverna.org.uk/documentation/taverna-2-x/command-line-tool/2-3/
  • 32.
    ProteomicsServicesTeam Acknowledgements & thanks Allthe myGrid team Specially to …Katy Wolstencroft, Shoaib Sufi, Peter Li, Eric Nzuobontane