Paper presentation: Taverna, reloaded

694 views

Published on

citation:
Missier, P., Soiland-Reyes, S., Owen, S., Tan, W., Nenadic, A., Dunlop, I., et al. (2010).
Taverna, reloaded. In M. Gertz, T. Hey, & B. Ludaescher (Eds.), Procs. SSDBM 2010. Heidelberg, Germany.
Paper available at: http://www.ssdbm2010.org/.

Published in: Technology, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
694
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Paper presentation: Taverna, reloaded

  1. 1. , reloaded Paolo Missier, Stian Soiland-Reyes, Stuart Owen, Alex Nenadic, Ian Dunlop, Alan Williams, Carole Goble School of Computer Science University of Manchester, UK Wei Tan Tom Oinn Argonne National Laboratory Argonne, IL, USA EBI, UK SSDBM Heidelberg, Germany June 30 - July 2, 2010
  2. 2. Taverna: scientific workflow management Goal: perform cancer diagnosis using microarray analysis - learn a model for lymphoma type prediction based on samples from different lymphoma types source: caGrid http://www/myexperiment.org/workflows/746 2
  3. 3. Taverna: scientific workflow management Goal: perform cancer diagnosis using microarray analysis - learn a model for lymphoma type prediction based on samples from different lymphoma types lymphoma samples source: caGrid ➔ hybridization data http://www/myexperiment.org/workflows/746 2
  4. 4. Taverna: scientific workflow management Goal: perform cancer diagnosis using microarray analysis - learn a model for lymphoma type prediction based on samples from different lymphoma types lymphoma samples source: caGrid ➔ hybridization data process microarray data as training dataset http://www/myexperiment.org/workflows/746 2
  5. 5. Taverna: scientific workflow management Goal: perform cancer diagnosis using microarray analysis - learn a model for lymphoma type prediction based on samples from different lymphoma types lymphoma samples source: caGrid ➔ hybridization data process microarray data as training dataset learn predictive model http://www/myexperiment.org/workflows/746 2
  6. 6. Taverna: scientific workflow management Goal: perform cancer diagnosis using microarray analysis - learn a model for lymphoma type prediction based on samples from different lymphoma types lymphoma samples source: caGrid ➔ hybridization data http://www/myexperiment.org/workflows/746 2
  7. 7. In this (short) talk... • Taverna for data-intensive science – architectural goals... and how they are being achieved • Performance figures on benchmark workflows 3
  8. 8. Target -ilities • Scalability – size of input data / size of input collections • Configurability – each processor in the workflow individually tunable to adjust to the underlying platform • Extensibility – plugin architecture to accommodate specific service groups • caGrid, BioMoby – extensible set of operators that define the semantics of the model 4
  9. 9. Opportunities for parallel processing • intra-processor: implicit iteration over list data • inter-processor: pipelining [ id1, id2, id3, ...] 5
  10. 10. Opportunities for parallel processing • intra-processor: implicit iteration over list data • inter-processor: pipelining [ id1, id2, id3, ...] SFH1 SFH2 SFH3 ... ... ... getDS1 getDS2 getDS3 5
  11. 11. Opportunities for parallel processing • intra-processor: implicit iteration over list data • inter-processor: pipelining [ id1, id2, id3, ...] SFH1 SFH2 SFH3 ... ... ... getDS1 getDS2 getDS3 [ DS1, DS2, DS3, ...] 5
  12. 12. Configurable processor behaviour  allocates concurrent threads to list elements Default dispatch stack Request Parallelise Provenance captures service Error bounce invocation events Failover Response Retry persistent server error: tries each available Stop/pause service implementation Invoke in turn transient errors: interacts with retries one service underlying activity invocation - service operation - shell script execution
  13. 13. Extensible processor semantics  Default dispatch stack Request Parallelise Provenance Error bounce Failover Response Retry Stop/pause Invoke 7
  14. 14. Extensible processor semantics  Default dispatch stack Default dispatch stack Extended dispatch stack Request Request Request Parallelise Enhancing dataflow model Parallelise with limited form of loops to Parallelise Provenance Error bounce model busy waiting Error bounce Error bounce (asynchronous service Loop/Branching interaction) Failover Failover Failover Response Retry Response Response Retry Retry Invoke Stop/pause Invoke Invoke (a) (b) 7
  15. 15. Extensible processor semantics  Default dispatch stack Default dispatch stack Extended dispatch stack Request Request Request Parallelise Enhancing dataflow model Parallelise with limited form of loops to Parallelise Provenance Error bounce model busy waiting Error bounce Error bounce (asynchronous service Loop/Branching interaction) Failover Failover Failover Response Retry Response Response Retry Retry Invoke Stop/pause Invoke ."4&04& 7';'3% Invoke ;(4)4&0;=;"2.'4 '&&'./304&5,%& +"#,- (a) (b) !"#$#%&'()(*+,-%$.*//$0*1 2*(34/*56#%.8&9 +"#,- &670 Busy waiting !"#$)* 80&$9:5$;0%(<& using an ;0%(<& '&&'./304&5,%& explicit +"#,- workflow ./0.12&'&(% %&'&(% '&&'./304&5,%& pattern 2*(34/*5678&.8&9 +"#$%&'&(% !"#$%&'&(% )%$-"40 ,%$-"40 &0%& 2(..0%% 7
  16. 16. Extensible processor semantics  Default dispatch stack Default dispatch stack Extended dispatch stack Request Request Request Parallelise Enhancing dataflow model Parallelise with limited form of loops to Parallelise Provenance Error bounce model busy waiting Error bounce Error bounce (asynchronous service Loop/Branching interaction) Failover Failover Failover Response Retry Response Response Retry Retry Invoke Stop/pause Invoke ."4&04& 7';'3% Invoke ;(4)4&0;=;"2.'4 '&&'./304&5,%& +"#,- (a) (b) !"#$#%&'()(*+,-%$.*//$0*1 2*(34/*56#%.8&9 +"#,- &670 (3/360 4"7&)7& Busy waiting !"#$)* 80&$9:5$;0%(<& /1787&)/9/":437 using an ;0%(<& '&&'./304&5,%& 3&&3456)7&.$0& !"#$% explicit +"#,- !"#$% workflow ./0.12&'&(% %&'&(% '&&'./304&5,%& Busy waiting 45)4;:&3&10 3&&3456)7&.$0& 0&3&10 pattern using an 2*(34/*5678&.8&9 +"#$%&'&(% loop !"#$% &'() !"#$%&'&(% )%$-"40 ,%$-"40 processor *)&+,-.+/)012& /)012& 3&&3456)7&.$0& &0%& 2(..0%% 7
  17. 17. Performance experimental setup • previous version of Taverna engine used as baseline • objective: to measure incremental improvement list generator Parameters: - byte size of list elements (strings) - size of input list - length of linear chain multiple parallel pipelines main insight: when the workflow is designed for pipelining, parallelism is exploited effectively 8
  18. 18. I - Memory usage shorter execution time due to pipelining T2 main memory data management T2 main embedded Derby back-end T1 baseline list size: 1,000 strings of 10K chars each no intra-processor parallelism (1 thread/processor) 9
  19. 19. I - Memory usage shorter execution time due to pipelining T2 main memory data management T2 main embedded Derby back-end T1 baseline list size: 1,000 strings of 10K chars each no intra-processor parallelism (1 thread/processor) 9
  20. 20. Available processors pool pipelining in T2 makes up for smaller pools of threads/processor 10
  21. 21. Bounded main memory usage Separation of data and process spaces ensures scalable data management varying data element size: 10K, 25K, 100K chars 11
  22. 22. Summary • Taverna 2 re-engineered for scalability on data- intensive pipelined dataflows • separation of data and process spaces • generic stack pattern for principled extensions – limited while loop, if-then-else – provenance capture – ... open plugin architecture accommodates further extensions For further details and a nice chat: please visit our poster!! 12

×