Successfully reported this slideshow.

Taverna 2 in Pictures


Published on

Title: Taverna 2 in Pictures
Author: Tom Oinn

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Taverna 2 in Pictures

  1. 1. Tom Oinn,, BOSC2007, 19th July
  2. 2. Data Document Access through DataManager interface locally, DataPeer remotely Identifier with namespace LSID reference DataManager instance File reference Has unique namespace Stores within peer group ….. Zero or more ref scheme instances pointing to identical immutable data Locational Context List (depth) Reference Configuration Scheme Plugins Identifier with namespace (extension point) LSID Depth, List of child IDs • No context required? LSID URL URL • Local network name, subnet mask Error File File Identifier with namespace • File system name and mount point …? …? Depth, Detail • Whatever you need here
  3. 3. Example nested list structure : •Downstream process filters on the event depth it needs: Leaf1 •If the minimum depth is too high it iterates, discarding all List2 but the finest grained events Leaf2 list1 •If the maximum depth is too low it wraps in a new single List3 leaf3 element collection, discarding all but the root event •Identifiers in the boxes are those from the previous slide Appears on data link as : Leaf3[1,0] List3[1] Leaf2[0,1] Leaf1[0,0] List2[0] List1[] Processors (or, more accurately, service proxies) can now emit results piece by piece Sensor proxy that can emit a temperature reading / cell count / image every ten seconds Database query that returns rows one row at a time from the data server Management of collection events is handled by the framework
  4. 4. Taverna 2 opens up the per-processor dispatch logic. Dispatch layers can ignore, pass unmodified, block, modify or act on any message and can communicate with adjacent layers. Each processor contains a single stack of arbitrarily many dispatch layers. Single dispatch layer Dispatch layer composition allows for complex control flow within a Job Queue & Single Job & Service List given processor. Service List Job specification messages from DispatchLayer is an extensibility layer above point. DispatchLayer implementation Use it to implement dynamic Data and error messages from binding, caching, recursive layer below Fault Result behaviour…? Message Message
  5. 5. Parallelize This dispatch stack configuration • Ensures that at least ‘n’ jobs are pulled from the queue and sent to the layer below • Reacts to faults and results by pulling more jobs off the queue and sending them replicates the current down, passing the fault or result message back up to the stack manager Taverna 1 processor logic in that retry is Failover within failover and • Responds to job events from above by storing the job, removing all but one service from the service list and passing the job down. both are within the • Responds to faults by fetching the corresponding job, rewriting the original service parallelize layer. set to include only the next service and resending the job down. If no more services are available propagate the fault upwards • Responds to results by discarding any failover state for that job Layers can occur multiple times, you Retry could easily have • Responds to jobs by storing the job along with an initial retry count of zero retry both above • Responds to faults by checking the retry count, and either incrementing and resending the job or propagating the fault message if the count is exceeded and below the failover layer for Invoke example. • Responds to jobs by invoking the first concrete service in the service list with the specified input data • Sends fault and results messages to the layer above
  6. 6. ‘Service’ in this case means ‘Taverna 2 proxy to something we can invoke’ – name might change! Service invocation is asynchronous by default – all AsynchronousService implementations should return control immediately and, ideally, use thread pooling amongst instances of that type. Results, failure messages are pushed to an AsynchronousServiceCallback object which also provides the necessary context to the invocation : Provenance DataManager SecurityManager Message Push Connector • Resolve input data • Provides a set of • Allows explicit push • Used to push fault references security agents of actor state P- and result messages available to assertions to a back to the • Register result data manage connected invocation layer of to get an identifier authentication provenance store the dispatch stack to return against protected for invocation resources specific metadata capture
  7. 7. Client Policy Policy engine In this scenario the agent is discovered Set of credentials based on the service, a message is passed to the agent to be signed Service and that message relayed. Security Agent Credentials never leave the agent!
  8. 8. Taverna 2 combines data managers, workflow enactors and security agents into transient collaborative virtual experiments within a peer group. These groups can be shared and membership managed over time and can persist beyond a single workflow run. User 1 User 2 External External Data Services Stores i.e. SRB Policy Policy Policy Policy engine engine Enactor Set of Set of DM DM DM credentials credentials Peer group (i.e. JXTA group) – Virtual Experiment Session
  9. 9. Define a workflow as nested boundaries of control. Each boundary pushes its identifier onto an ID stack on data entering it and pops it when exiting. When a new ID is created the controlling entity registers with a singleton monitor tree, attaching to the parent identified by the path defined by the previous value of the ID stack on that data. P1 WF1 Iteration over nested workflow here… WF1_1 P2 P3 WF2_1 Q1 P1 WF2 P3 WF2_2 Q1 Q1 Each node defines a set of properties. If a property is mutable it can be used to steer the enactment. P2 Properties could include parallelism setting, service binding criteria, current job queue length, queue consumption, number of failures in the last minute…
  10. 10. Due December 2007 in ‘visible to end user’ form.   Thisrelease will probably not include everything, esp steering agents and virtual experiment management.  Early tech preview real soon now [tm] Complete code rewrite, current status is around  90% complete on enactor and data manager core. Java code in CVS on sourceforge, project name is  ‘taverna’, CVS module is ‘t2core’ Licensed under LGPL at present  Hands on session later if anyone’s interested? 
  11. 11. Core Research and Investigators Postgraduates Pioneers Funding and Industrial Development • Matthew Addis, Andy • Tracy Craddock, Keith • EPSRC • Nedim Alpdemir, Pinar •Hannah Tipney, May Brass, Alvaro Fernandes, Flanagan, Antoon Goderis, Alper, Khalid Belhajjame, • Wellcome Trust Tassabehji, Medical Genetics Rob Gaizauskas, Carole Alastair Hampshire, Duncan Tim Carver, Rich Cawley, team at St Marys Hospital, • OMII-UK Goble, Chris Greenhalgh, Hull, Martin Szomszor, Justin Ferris, Matthew Manchester, UK; Simon Luc Moreau, Norman Paton, Kaixuan Wang, Qiuwei Yu, Gamble, Kevin Glover, Pearce, Claire Jennings, • Dennis Quan, Sean Martin, Peter Rice, Alan Robinson, Jun Zhao Mark Greenwood, Ananth Institute of Human Genetics Michael Niemi (IBM), Mark Robert Stevens, Paul Krishna, Matt Lee, Peter Li, Wilkinson (BioMOBY) School of Clinical Medical Watson, Anil Wipat Phillip Lord, Darren Marvin, Sciences, University of Simon Miles, Arijit Newcastle, UK; Doug Kell, Mukherjee, Tom Oinn, Stuart Owen, Juri Papay, Peter Li, Manchester Centre Savas Parastatidis, for Integrative Systems Matthew Pocock, Stefan Biology, UoM, UK; Andy Rennick-Egglestone, Ian Brass, Paul Fisher, Bio-Health Roberts, Martin Senger, Informatics Group, UoM, UK, Nick Sharman, Stian Simon Hubbard, Faculty of Soiland, Victor Tan, Franck Life Sciences, UoM, UK Tanoh, Daniele Turi, Alan R. Williams, David Withers, Katy Wolstencroft and Chris Wroe Additional T2 thanks to Matthew Pocock, Thomas Down & David DeRoure amongst others! Please see for most up to date list