0
http://taverna.org.uk/	  S"an	  Soiland-­‐Reyes	  &	  Robert	  Haines	  myGrid,	  School	  of	  Computer	  Science	       ...
What	  is	  myGrid?	      An	  e-­‐Science	  Collabora"on	  Since	  2001	      Not	  a	  grid!	      Numerous	  partner...
Mo"va"on:	  Bioinforma)cs	      Challenge:	         Large	  amounts	  of	  data	         Many	  open	  ques"ons	       ...
Huge	  amounts	  of	  data	                                                                Microarray	                    ...
Manual	  approach	      Search	  using	  public	  web	  sites	  and	  databases	         Pubmed	         Uniprot	      ...
Manual:	  disadvantages	  •    Scale	  of	  analysis	  task	  overwhelms	  researchers	       –	  lots	  of	  data	  •    ...
Web	  services	  and	  workflows	    Web	  services	       Technology	  and	  standards	  for	  exposing	  code	  and	   ...
The Taverna Open Source Suite of Tools          Web PortalsWorkflow Repository           GUI Workbench       Client User I...
Taverna	  workflows	                                                                                                Workflo...
What	  types	  of	  services	  and	  data?	      WSDL/SOAP	  web	  services	         Secured	  invoca"on	  with	  HTTPS/...
Service	  limita"ons	    Web	  service	  crea"on	  involves	  wrapping	     exis"ng	  tools	  or	  wri"ng	  WS	  code	  ...
Which	  services?	    Taverna	  is	  general,	  can	  connect	  to	  standard	     web	  services	  and	  command	  line	...
http://www.mygrid.org.uk/	     http://www.taverna.org.uk/	  
BioCatalogue	  integra"on	      Search	  services	  from	       workbench	      Add	  services	  to	  workflow	      Vie...
Taverna	  	   workbench	      Graphical	  desktop	  tool	  	      No	  server	  installa"on	       required	      Drag-...
http://www.mygrid.org.uk/	     http://www.taverna.org.uk/	  
http://www.mygrid.org.uk/	     http://www.taverna.org.uk/	  
http://www.mygrid.org.uk/	     http://www.taverna.org.uk/	  
Sharing	  workflows	    myExperiment.org	  allows	  users	  to	  share,	     find,	  download	  and	  rate	  workflows	    ...
myExperiment	  integra"on	    Search	  and	  browse	     workflows	       By	  tags	       Free	  text	  search	       ...
Taverna	  workflow	  features	      Nested	  workflows	         Reuse	  exis"ng	  components	      Implicit	  itera"ons	 ...
Extensible	  UI	  and	  engine	    Plugins	  can	  provide	  new	  “perspec"ves”	       e.g.:	  BioCatalogue,	  myExperi...
Workflow	  limita"ons	    Ini"ally	  designed	  for	  dataflows	       Not	  suitable	  for	  business	  processes	  like	...
Data	  and	  provenance	  handling	      Data	  references	  passed	  between	  services	  in	  workflow	         http,	 ...
Data	  limita"ons	    Running	  Workbench	  limited	  by:	       Local	  disk	  space	  for	  storing	  data	       Net...
Parameter	  sweeps	    Implicit	  itera)ons	  with	  pipelining	  provides	     an	  intui"ve	  way	  to	  set	  up	  par...
Taverna	  command	  line	      Executes	  from	  a	       Windows/Linux/OSX	       shells	      Takes	  a	  predefined	  ...
Taverna	  Server	    REST/SOAP	  interface	  to	  	     execute	  workflows	    Client	  libraries	  for	  Ruby	  and	  J...
Taverna	  portlet	      Example	  portlet	       interface	      Executes	  workflows	       using	  Taverna	  Server	   ...
http://www.mygrid.org.uk/	     http://www.taverna.org.uk/	  
Ruby	  web	  interface	      Example	  customized	                    Uses	  Ruby	  gem	       web	  interface	         ...
Grids	  and	  clusters	    Taverna	  have	  been	  integrated	  with	  several	     leading	  grid	  and	  middleware	   ...
Taverna	  on	  the	  cloud	    Use-­‐case:	       SNP	  analysis	  and	  annota"on	  of	         genome	  sequenced	  fr...
http://www.mygrid.org.uk/	     http://www.taverna.org.uk/	  
Taverna	  3	  roadmap	    OSGi	  plugin	  system	    Workflow	  language:	  Scufl2	       Compound	  format;	  embedding	...
Open	  source,	  open	  development	    Taverna	  suite	  of	  tools	  are	  all	  open	  source,	     free	  to	  use	  ...
Who	  uses	  Taverna?	    Bioinforma"cs:	  EMBL-­‐EBI,	  ONDEX	    Astronomy:	  HELIO,	  AstroGrid,	  SAMPO	    Enginee...
Taverna	  in	  numbers	                                                          myExperiment:	  	                       ...
Acknowledgements	  http://www.mygrid.org.uk/	     http://www.taverna.org.uk/	  
http://www.mygrid.org.uk/	     http://www.taverna.org.uk/	  
More	  informa"on	    hhp://www.mygrid.org.uk/	    hhp://www.taverna.org.uk/	    hhp://www.myexperiment.org/	    hhp:/...
Upcoming SlideShare
Loading in...5
×

2011-06-08 Taverna workflow system

933

Published on

Taverna workflow system - presented by Stian Soiland-Reyes at ITER Integrated Modelling workshop in Cadarache, France on 2011-06-08.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
933
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "2011-06-08 Taverna workflow system"

  1. 1. http://taverna.org.uk/  S"an  Soiland-­‐Reyes  &  Robert  Haines  myGrid,  School  of  Computer  Science   University  of  Manchester,  UK   ITER  IM  workshop   Château  de  Cadarache,  2011-­‐06-­‐08  
  2. 2. What  is  myGrid?    An  e-­‐Science  Collabora"on  Since  2001    Not  a  grid!    Numerous  partners  involved:     University  of  Manchester     University  of  Southampton     University  of  Oxford     EMBL-­‐EBI    Provides  sustainable  and  produc"on  quality  soTware     Supported  by  OMII-­‐UK,  EPSRC  and  BBSRC    Mixture  of  developers,  bioinforma"cians  and   researchers   SoTware  |  Services  |  Content  |  Skills  |  Community   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  3. 3. Mo"va"on:  Bioinforma)cs    Challenge:     Large  amounts  of  data     Many  open  ques"ons     Numerous  freely   available  public   datasets  and  analysis   tools   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  4. 4. Huge  amounts  of  data   Microarray   1000+  Genes   QTL  regions   100+  Genes   How  do  I  look   Next  Gen   at  all  the  genes   systema)cally?   Sequencing   100,000+   Genes   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  5. 5. Manual  approach    Search  using  public  web  sites  and  databases     Pubmed     Uniprot     EBI  BioMart    Copy  and  paste  to  web  tools  for  analysis     NCBI  Blast     EBI  InterPro    Further  processing  locally     R     Perl     Python   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  6. 6. Manual:  disadvantages  •  Scale  of  analysis  task  overwhelms  researchers   –  lots  of  data  •  User  bias  and  premature  filtering  of  datasets  –   cherry  picking  •  Hypothesis-­‐Driven  approach  to  data  analysis  •  Constant  changes  in  data  -­‐  problems  with  re-­‐ analysis  of  data  •  Implicit  methodologies  (hyper-­‐linking  through   web  pages)  •  Error  prolifera)on  from  any  of  the  listed  issues   –  notably  human  error   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  7. 7. Web  services  and  workflows    Web  services     Technology  and  standards  for  exposing  code  and   data  resources  that  can  be  programma)cally   consumed  by  a  remote  third  party     Descrip"on  on  how  to  interact  with  the  service,   parameters,  documenta"on    Workflows     General  technique  for  describing  and  execu"ng   a  process     Describe  what  you  want  to  do  running  which   services   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  8. 8. The Taverna Open Source Suite of Tools Web PortalsWorkflow Repository GUI Workbench Client User Interfaces Virtual Machine Third Party Tools Service Catalogue Workflow Engine Provenance Workflow Store ServerActivity and Service Plug-in Manager Open Provenance Model Programming and Secure Service Access APIs
  9. 9. Taverna  workflows   Workflow Inputs start_position chromosome_name end_position genes_in_qtl A  set  of  (local  and  remote)   mmusculus_gene_ensembl remove_entrez_duplicates remove_uniprot_duplicates create_report   services  to  analyze  or  manage   merge_entrez_genes merge_uniprot_ids remove_Nulls REMOVE_NULLS_2 data   add_ncbi_to_string add_uniprot_to_string Kegg_gene_ids_2 Kegg_gene_ids concat_kegg_genes   Nested  workflows  are  also   split_gene_ids regex_2 split_for_duplicates Get_pathways remove_duplicate_kegg_genes Workflow Inputs services   Data-­‐links  connects  services   regex gene_ids split_by_regex lister   get_pathways_by_genes1   i.e.  output  from  service  A  is  input  to   service  B  and  C   Merge_pathways concat_ids   Describes  the  desired  dataflow   concat_gene_pathway_ids Merge_gene_pathways instead  of  process  coordina"on   Workflow Outputs pathway_genes pathway_ids merge_pathway_list_1 merge_pathway_list_2 split_for_duplicate_pathways   Automa"c  itera"ons   Can  customize  list  handling  and   remove_duplicate_ids pathway_descriptions   control  links   gene_descriptions merge_genes_and_pathways remove_pathway_duplicates merge_gene_desc merge_genes_and_pathways_2 merge_pathway_desc remove_nulls_3 merge_genes_and_pathways_3 remove_pathway_nulls merge_patwhay_ids species kegg_pathway_releaseWorkflow Outputs flatten_pathway_files remove_pathway_nulls_2 merge_kegg_references merge_reports getcurrentdatabase binfo   gene_descriptions genes_pathways merged_pathways pathway_descriptions pathway_ids kegg_external_gene_reference report ensembl_database_release kegg_pathway_release http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  10. 10. What  types  of  services  and  data?    WSDL/SOAP  web  services     Secured  invoca"on  with  HTTPS/SSL/WS-­‐Security    RESTful  web  services     Secured  invoca"on  with  HTTPS/Basic  Auth    Spreadsheet  import    Command  line  tools  (local,  SSH)    Inline  scripts  (Beanshell,  R)    Excel/CSV  spreadsheets    Java  APIs    Customiza"ons:     BioMart,  BioMoby  /  SADI     Soaplab     Grid  services  (EGEE  gLite,  caGrid,  PBS,  UNICORE)     …  your  tool  (Plugin  tutorial  in  wiki)   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  11. 11. Service  limita"ons    Web  service  crea"on  involves  wrapping   exis"ng  tools  or  wri"ng  WS  code    Web  services  can  go  down       can  use  redundant  services  in  workflow       Service  monitoring    Transferring  data  up/down  to  WS  slow       Support  references  in  WS  interface    Execu"ng  command  line  tools  directly  requires   execu"on  access     Trickier  to  share  workflows,  require  either  SSH/grid   creden)als  or  installing  tools  locally   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  12. 12. Which  services?    Taverna  is  general,  can  connect  to  standard   web  services  and  command  line  tools  for  any   domain    in  bioinforma"cs..     From  professional  third-­‐party  organisa"ons   providing  robust  &  open  data/analysis  services     ..to  under-­‐the-­‐desk  web  services  for  one  par"cular   purpose,  ran  by  PhD  students       hhp://biocatalogue.org/  -­‐  2000+  services  from   140+  providers  –  crowd  sourced  and  quality   monitored   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  13. 13. http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  14. 14. BioCatalogue  integra"on    Search  services  from   workbench    Add  services  to  workflow    View  service  descrip)ons   and  up)me  status  from   within  workflow   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  15. 15. Taverna     workbench    Graphical  desktop  tool      No  server  installa"on   required    Drag-­‐and-­‐drop  services   into  diagram    Connect  services,  run,   reconnect,  rerun    Integrates  diverse  set   of  tools   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  16. 16. http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  17. 17. http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  18. 18. http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  19. 19. Sharing  workflows    myExperiment.org  allows  users  to  share,   find,  download  and  rate  workflows    “Facebook  for  the  scien"st”    4000+  members,  1400+  workflows    Open  source  code,  can  set  up  own  instance   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  20. 20. myExperiment  integra"on    Search  and  browse   workflows     By  tags     Free  text  search     Own/group  workflows     Packs,  e.g.  “Examples”    Upload/share  workflows   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  21. 21. Taverna  workflow  features    Nested  workflows     Reuse  exis"ng  components    Implicit  itera"ons     With  customizable  list  handling    Pipelining     Process  par"al  itera"on  results  early    Parallelisa"on     Run  as  soon  as  data  is  available    Retries,  failover,  looping     For  stability  and  condi"onal  tes"ng    Plugin-­‐extensible  execu"on  control     Ideas:  caching,  error  detec"on,  dynamic  service  lookup   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  22. 22. Extensible  UI  and  engine    Plugins  can  provide  new  “perspec"ves”     e.g.:  BioCatalogue,  myExperiment    Provide  service-­‐specific  customiza"on     e.g.:  BioMart  interface  replicates  web  site    Adding  new  func"onality     New  service  types,  eg:  …     Execu"on  control  like  looping/branching     Design  helpers,  “Find  matching  service”   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  23. 23. Workflow  limita"ons    Ini"ally  designed  for  dataflows     Not  suitable  for  business  processes  like  “HR   procedure  for  hiring  new  staff”     Long-­‐running  workflows  require  Taverna  Server     ..  But  suitable  for  coordina)ng  command  line   and  grid  execu"ons,  the  data  might  just  be  job   references     Execu"on  control  extensible,  eg:     Looping,  Branching     Dynamic  service  lookup     Data  manipula"on,  Error  detec"on   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  24. 24. Data  and  provenance  handling    Data  references  passed  between  services  in  workflow     http,  file,  sftp,  gridftp,  etc  (extensible)    Data  downloaded/uploaded  or  references  translated   when  needed    Provenance  captured  for  workflow  runs     Trace  execu"on  steps,  view  intermediate  values  while  running     Export  as  Open  Provenance  Model  (OPM)  /  RDF     Proof  and  origin  of  produced  outputs     Extensible  annota)ons    Wf4Ever:  reproducible  research  objects     Workflow/data  as  a  scien"fic  publica"on    preserva"on     Need  to  capture  more  service  data  and  metadata   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  25. 25. Data  limita"ons    Running  Workbench  limited  by:     Local  disk  space  for  storing  data     Network  speeds  for  up/download     Firewall  access       Execute  wf  using  Taverna  Server  or   command  line  remotely  with  ssh/job  submission    No  standardized  WS  reference  mechanism     Agree  on  mechanism  within  WS  ‘family’  with   shared  disk  (eg.  deconstruct  local  path  from   HTTP  URI)   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  26. 26. Parameter  sweeps    Implicit  itera)ons  with  pipelining  provides   an  intui"ve  way  to  set  up  parameter   sweeps    Advanced  looping  and  extensible  execu)on   control  allows  itera"ve  &  recursive   reduc"ons/approxima"ons   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  27. 27. Taverna  command  line    Executes  from  a   Windows/Linux/OSX   shells    Takes  a  predefined   workflow  with  files  as   inputs  and  outputs    Quick  way  to   “produc"onize”  a   workflow   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  28. 28. Taverna  Server    REST/SOAP  interface  to     execute  workflows    Client  libraries  for  Ruby  and  Java    Two  demonstra"on  web  interfaces     Ruby     Java  Portlets    Upcoming:     Security  delega"on     AWS  image   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  29. 29. Taverna  portlet    Example  portlet   interface    Executes  workflows   using  Taverna  Server   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  30. 30. http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  31. 31. Ruby  web  interface    Example  customized     Uses  Ruby  gem   web  interface   t2-­‐server   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  32. 32. Grids  and  clusters    Taverna  have  been  integrated  with  several   leading  grid  and  middleware   infrastructures,  such  as:     PBS     caGrid/Globus     EGEE/gLite     NorduGrid’s  ARC     JSDL/GridSAM    Plans  for  SAGA  integra"on   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  33. 33. Taverna  on  the  cloud    Use-­‐case:     SNP  analysis  and  annota"on  of   genome  sequenced  from   breeds  of  cows  in  Africa  –  why  are     some  of  them  resistent  to  X?     Amazon  EC2  with  Taverna  Server  and  local   services     Ruby  on  Rails  web  interface     Runs  through  31  chromosomes  in  2  hours  using   10  instances  -­‐  $10   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  34. 34. http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  35. 35. Taverna  3  roadmap    OSGi  plugin  system    Workflow  language:  Scufl2     Compound  format;  embedding  metadata,   dependencies,  independent  API  for  crea"ng/ inspec"ng  workflows    Components     Finding/sharing  command  line  tool  descrip"ons     Richer  way  of  finding  compa"ble  services   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  36. 36. Open  source,  open  development    Taverna  suite  of  tools  are  all  open  source,   free  to  use  and  customize    Large  user  community,  ac"ve  mailing  lists    Lead  developers:  myGrid  in  Manchester  UK    Contributors  from  across  the  world    PAL  programme    myGrid  provides  training,  tutorials  and   documenta)on   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  37. 37. Who  uses  Taverna?    Bioinforma"cs:  EMBL-­‐EBI,  ONDEX    Astronomy:  HELIO,  AstroGrid,  SAMPO    Engineering:  NASA  Jet  Propulsion  Lab  (JPL)    Chemistry:  CDK,  CIC    Biodiversity:  BioVel    Preserva"on:  Wf4Ever,  SCAPE    BioMedicine/Cancer  research:  caGrid    Data/text  mining:  eLico,  AID   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  38. 38. Taverna  in  numbers     myExperiment:       4000+  registered  users     56  countries    Taverna:     1400+  workflows     361  organisa"ons       48  countries     BioCatalogue:       70,000+  downloads     2000+  services     ~4000  source     150+  service  providers     500+  members     27  countries   http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  39. 39. Acknowledgements  http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  40. 40. http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  41. 41. More  informa"on    hhp://www.mygrid.org.uk/    hhp://www.taverna.org.uk/    hhp://www.myexperiment.org/    hhp://www.biocatalogue.org/     http://www.mygrid.org.uk/   http://www.taverna.org.uk/  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×