Successfully reported this slideshow.
Taverna: From Biology to Astronomy Dr Katy Wolstencroft University of Manchester my Grid OMII-UK
What is Taverna? <ul><li>An environment for workflow design and execution </li></ul><ul><li>User interface to a larger sui...
OMII  Open Middleware Infrastructure Institute <ul><li>University of Manchester joined with the Universities of Edinburgh ...
The Life Science Community <ul><li>In silico Biology is an open Community </li></ul><ul><li>Open access to data </li></ul>...
The Community Problems  <ul><li>Everything is Distributed   </li></ul><ul><ul><li>Data, Resources and Scientists </li></ul...
Lots of Resources NAR 2007 – 968 databases
Traditional Bioinformatics 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt  12241 cagtctttta aattt...
Workflows as a Solution <ul><li>Describes  what   you want to do ,  not  how  you want to do it  </li></ul><ul><li>High le...
Taverna Workflow Components Scufl  Simple Conceptual Unified Flow Language
Taverna in an Open World   <ul><li>Open  domain services and resources. </li></ul><ul><li>Taverna accesses 3000+ services ...
What can you do with  my Grid? <ul><li>~33,000 downloads </li></ul><ul><li>Users worldwide  </li></ul><ul><li>US, Singapor...
Examples – Early Pioneers Williams-Beuren Syndrome Four workflow cycles totalling ~ 10 hours The gap was correctly closed ...
Trypanosomiasis in Africa http://www.genomics.liv.ac.uk/tryps/trypsindex.html <ul><li>Resistance to parasites in different...
Is Taverna Just for Biologists? <ul><li>Nothing in the code is specific to biology </li></ul><ul><li>The default list of s...
Other Examples <ul><li>Medical imaging </li></ul><ul><ul><li>MIAS-GRID –investigating cartilage thickness during drug tria...
Dilbert ##
What Taverna Gives you <ul><li>Automation </li></ul><ul><li>Implicit iteration </li></ul><ul><li>Implicit parallelisation ...
Extensibility <ul><li>Accepts many types of services: </li></ul><ul><li>- web services, beanshell scripts, local java scri...
Could Taverna be used for Astronomy? <ul><li>Lots of data (although individual data items might be bigger) </li></ul><ul><...
Sampo - European Southern Observatory project Workflows for data reduction Reasons for choosing Taverna   Open source   ...
AstroGrid Workflows   Evaluation of Taverna Building plug-ins for AstroGird project In the process of gathering AstroGrid ...
Coming soon…Taverna 2 <ul><li>A complete redesign of Taverna from the ground up to enable: </li></ul><ul><li>Streaming dat...
my Grid acknowledgements <ul><li>Carole Goble, Norman Paton, Robert Stevens, Anil Wipat, David De Roure, Steve Pettifer </...
Upcoming SlideShare
Loading in …5
×

wolstencroft-ogf20-astro

444 views

Published on

Published in: Economy & Finance, Technology
  • Be the first to comment

  • Be the first to like this

wolstencroft-ogf20-astro

  1. 1. Taverna: From Biology to Astronomy Dr Katy Wolstencroft University of Manchester my Grid OMII-UK
  2. 2. What is Taverna? <ul><li>An environment for workflow design and execution </li></ul><ul><li>User interface to a larger suite of middleware – my Grid </li></ul><ul><li>Designed to support in silico experiments in biology </li></ul><ul><li>Open source </li></ul>
  3. 3. OMII Open Middleware Infrastructure Institute <ul><li>University of Manchester joined with the Universities of Edinburgh and Southampton in March 2006 </li></ul><ul><li>OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its international collaborators. </li></ul><ul><li>A guarantee of development and support </li></ul>
  4. 4. The Life Science Community <ul><li>In silico Biology is an open Community </li></ul><ul><li>Open access to data </li></ul><ul><li>Open access to resources </li></ul><ul><li>Open access to tools </li></ul><ul><li>Open access to applications </li></ul><ul><li>Global in silico biological research </li></ul>
  5. 5. The Community Problems <ul><li>Everything is Distributed </li></ul><ul><ul><li>Data, Resources and Scientists </li></ul></ul><ul><li>Heterogeneous data </li></ul><ul><li>Very few standards </li></ul><ul><ul><li>I/O formats, data representation, annotation </li></ul></ul><ul><ul><li>Everything is a string! </li></ul></ul><ul><li>Integration of data and interoperability of resources is difficult </li></ul>
  6. 6. Lots of Resources NAR 2007 – 968 databases
  7. 7. Traditional Bioinformatics 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa
  8. 8. Workflows as a Solution <ul><li>Describes what you want to do , not how you want to do it </li></ul><ul><li>High level description of the experiment </li></ul><ul><li>Easier to explain, share, relocate, reuse and repurpose. </li></ul><ul><li>Workflow <=> Model </li></ul><ul><li>Workflow is the integrator of knowledge </li></ul><ul><li>The METHODS section of a scientific publication </li></ul>
  9. 9. Taverna Workflow Components Scufl Simple Conceptual Unified Flow Language
  10. 10. Taverna in an Open World <ul><li>Open domain services and resources. </li></ul><ul><li>Taverna accesses 3000+ services </li></ul><ul><li>Third party – we don’t own them – we didn’t build them </li></ul><ul><li>All the major providers </li></ul><ul><ul><li>NCBI, DDBJ, EBI … </li></ul></ul><ul><li>Enforce NO common data model. </li></ul><ul><li>Quality Web Services considered desirable </li></ul>
  11. 11. What can you do with my Grid? <ul><li>~33,000 downloads </li></ul><ul><li>Users worldwide </li></ul><ul><li>US, Singapore, UK, Europe, Australia </li></ul><ul><li>Systems biology </li></ul><ul><li>Proteomics </li></ul><ul><li>Gene/protein annotation </li></ul><ul><li>Microarray data analysis </li></ul><ul><li>Medical image analysis </li></ul><ul><li>Heart simulations </li></ul><ul><li>High throughput screening </li></ul><ul><li>Genotype/Phenotype studies </li></ul><ul><li>Health Informatics </li></ul><ul><li>Astronomy </li></ul><ul><li>Chemoinformatics </li></ul><ul><li>Data integration </li></ul>
  12. 12. Examples – Early Pioneers Williams-Beuren Syndrome Four workflow cycles totalling ~ 10 hours The gap was correctly closed and all known features identified Identifying new human genome sequence and genes contained within in an area of the genome associated with the disease Improve understanding between genotype and phenotype CTA-315H11 CTB-51J22 ELN WBSCR14 RP11-622P13 RP11-148M21 RP11-731K22 314,004bp extension All nine known genes identified (40/45 exons identified) CLDN4 CLDN3 STX1A WBSCR18 WBSCR21 WBSCR22 WBSCR24 WBSCR27 WBSCR28
  13. 13. Trypanosomiasis in Africa http://www.genomics.liv.ac.uk/tryps/trypsindex.html <ul><li>Resistance to parasites in different breeds of cattle </li></ul><ul><li>Involves: </li></ul><ul><li>Microarray analysis </li></ul><ul><li>Classical genetics </li></ul><ul><li>Biochemical pathway analysis </li></ul>Large data sets, large results sets
  14. 14. Is Taverna Just for Biologists? <ul><li>Nothing in the code is specific to biology </li></ul><ul><li>The default list of services ARE bio services, but Taverna doesn’t care what they are </li></ul><ul><li>Services from other science disciplines can simply be slotted in </li></ul>
  15. 15. Other Examples <ul><li>Medical imaging </li></ul><ul><ul><li>MIAS-GRID –investigating cartilage thickness during drug trials </li></ul></ul><ul><ul><li>2D and 3D brain image registration </li></ul></ul><ul><li>Chemoinformatics </li></ul><ul><ul><li>CDK-Taverna – project to provide the CDK chemoinformatics tool set as web services </li></ul></ul><ul><ul><li>Chimatica - Virtual Drug Candidate Production Environment </li></ul></ul><ul><li>Health informatics </li></ul><ul><ul><li>PsyGrid – investigating first episode psychosis </li></ul></ul>
  16. 16. Dilbert ##
  17. 17. What Taverna Gives you <ul><li>Automation </li></ul><ul><li>Implicit iteration </li></ul><ul><li>Implicit parallelisation </li></ul><ul><li>Support for nested workflow construction </li></ul><ul><li>Error handling </li></ul><ul><ul><li>Retry, failover and automatic substitution of alternates </li></ul></ul>
  18. 18. Extensibility <ul><li>Accepts many types of services: </li></ul><ul><li>- web services, beanshell scripts, local java scripts, JDBC connections…etc </li></ul><ul><li>Easy to add your own services </li></ul><ul><li>Plug-in architecture </li></ul><ul><li>Easy to build new processor types </li></ul><ul><li>Easy to extend to include alternative results viewers </li></ul><ul><li> </li></ul>
  19. 19. Could Taverna be used for Astronomy? <ul><li>Lots of data (although individual data items might be bigger) </li></ul><ul><li>Distributed data </li></ul><ul><li>Chains of analyses </li></ul><ul><li>MORE standards for data formatting/exchange </li></ul><ul><li>Investigated by AstroGrid and SAMPO </li></ul>
  20. 20. Sampo - European Southern Observatory project Workflows for data reduction Reasons for choosing Taverna  Open source  Free  Allows customisation  Easy to use and adapt  Designed for science  Most workflow engines are meant for business applications  Very robust  Actively developed  Good support for web services
  21. 21. AstroGrid Workflows Evaluation of Taverna Building plug-ins for AstroGird project In the process of gathering AstroGrid requirements Still things to address……..
  22. 22. Coming soon…Taverna 2 <ul><li>A complete redesign of Taverna from the ground up to enable: </li></ul><ul><li>Streaming data </li></ul><ul><li>Management of large volumes of data </li></ul><ul><li>Better remote workflow execution </li></ul><ul><li>Integration with grid resources </li></ul><ul><li>Monitoring and steering </li></ul><ul><li>Beta release due end summer 2007 </li></ul>
  23. 23. my Grid acknowledgements <ul><li>Carole Goble, Norman Paton, Robert Stevens, Anil Wipat, David De Roure, Steve Pettifer </li></ul><ul><li>OMII-UK Tom Oinn, Katy Wolstencroft, Daniele Turi, June Finch, Stuart Owen, David Withers, Stian Soiland, Franck Tanoh, Matthew Gamble, Alan Williams </li></ul><ul><li>Research Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, Antoon Goderis, Alastair Hampshire, Qiuwei Yu, Wang Kaixuan. </li></ul><ul><li>Current contributors Matthew Pocock, James Marsh, Khalid Belhajjame, PsyGrid project, Bergen people, EMBRACE people. </li></ul><ul><li>User Advocates and their bosses Simon Pearce, Claire Jennings, Hannah Tipney, May Tassabehji, Andy Brass, Paul Fisher, Peter Li, Simon Hubbard, Tracy Craddock, Doug Kell, Marco Roos, Matthew Pocock, Mark Wilkinson </li></ul><ul><li>Past Contributors Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Juri Papay, Savas Parastatidis, Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Victor Tan, Paul Watson, and Chris Wroe. </li></ul><ul><li>Industrial Dennis Quan, Sean Martin, Michael Niemi (IBM), Chimatica. </li></ul><ul><li>Funding EPSRC, Wellcome Trust. </li></ul>

×