Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Owen bosc2010 taverna2.2-cows


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Owen bosc2010 taverna2.2-cows

  1. 1. Analysing African and European cattle with Taverna 2.2 <ul><li>Stuart Owen </li></ul><ul><li>Based on the work by : </li></ul><ul><li>Professor Andy Brass and Mohammad Khodadadi </li></ul><ul><li>University of Manchester, UK </li></ul><ul><li>Harry Noyes and Steve Kemp </li></ul><ul><li>University of Liverpool, UK </li></ul><ul><li>BOSC2010 – Boston. </li></ul>
  2. 2. Analysing African and European cattle with Taverna 2.2 A BioInformatics case study demonstrating the use of the Taverna 2 workflow system This is a snapshot of some exiting science which is currently in progress
  3. 3. Analysing African and European cattle with Taverna 2.2 <ul><li>10,000 years separation </li></ul><ul><li>African Livestock adaptations: </li></ul><ul><ul><li>Hardier </li></ul></ul><ul><ul><li>Better disease resistance </li></ul></ul><ul><li>Potential outcomes: </li></ul><ul><ul><li>Food security </li></ul></ul><ul><ul><li>Understanding resistance </li></ul></ul><ul><ul><li>Understanding environmental Conditions </li></ul></ul><ul><ul><ul><li>Drought </li></ul></ul></ul><ul><ul><ul><li>Parasites </li></ul></ul></ul><ul><li>Understanding diversity </li></ul><ul><li> </li></ul><ul><li> </li></ul>
  4. 4. Workflow and phases MAP FILTER ANALYSIS
  5. 5. Workflow and phases Input SNP file Populate DB with start SNP’s and resource version numbers Lift-over: maps between UMD3 and BTA4 cow assemblies Exon positions from ENSMBL Find SNPs in Exon regions PolyPhen to mark “dangerous” SNP’s
  6. 6. Little more about the phases … <ul><li>Input SNP file result of 15 fold average coverage of an entire Boran cow </li></ul><ul><ul><li>11.9 million SNP’s described. </li></ul></ul><ul><ul><li>Resulting from Next Generation Sequencing. </li></ul></ul><ul><li>All initial data is stored within a Database, mapped by a runID to the versions of ENSEMBL, LiftOver, Polyphen. </li></ul><ul><li>LiftOver – provides a mapping between 2 different reference cow assemblies – </li></ul><ul><ul><li>UMD3 : more accurate assembly </li></ul></ul><ul><ul><li>BTA4 : better annotated and ENSEMBL friendly </li></ul></ul><ul><ul><li>Store BT4 position, Chromosome and Allele in database </li></ul></ul><ul><ul><li>Filter out, but store , results where there is a mismatch between the base. </li></ul></ul>
  7. 7. … Little more about the phases <ul><li>ESEMBL is used to retrieve annotations about the SNP’s : </li></ul><ul><ul><li>For all the SNPs that have the same base we go over all the exons for cow in ENSEMBL and see if we can match the SNPs to any of these exons ( exon start < SNP position < exon end), also store geneID, Allele, associated Gene names, and Bio-Type. </li></ul></ul><ul><ul><li>Filter out, but store , ENSEMBL/BTA4 mismatches. </li></ul></ul><ul><ul><li>Second phase fetches the consequence according the the BTA4 positions. </li></ul></ul><ul><ul><li>From this information a file is generated for PolyPhen, for all SNPs that got non-synonymous as a consequence. </li></ul></ul><ul><li>A local instance of PolyPhen is queried using a file generated from the ENSEMBL annotations to produce an indication of the level to which a SNP changes the protein. </li></ul><ul><li>Outcome is an Annotated Database of ~20,000 “interesting” SNPs </li></ul>
  8. 8. Packaged as a sharable virtual machine image 11.9 Million SNPs LiftOver Results PolyPhen 50,000 annotated SNPs ENSEMBL 11.9 Million SNPs LiftOver Results PolyPhen 20,000 annotated SNPs + provenance. ENSEMBL
  9. 9. Packaged as a sharable virtual machine image <ul><li>LiftOver, Taverna, PolyPhen and the Workflow is packaged as a Virtual Machine image. </li></ul><ul><ul><li>Everything (except ENSEMBL) is run locally </li></ul></ul><ul><ul><li>Full Cow analysis takes 2 days – previous attempts would have taken an estimated 3 months for the PolyPhen phase alone. </li></ul></ul><ul><li>Results and experiment can be distributed and shared as a complete package </li></ul><ul><ul><li>Re-use </li></ul></ul><ul><ul><li>Repeatable </li></ul></ul><ul><ul><li>Reproducible </li></ul></ul><ul><li>Future plans to deploy the image on “The Cloud” </li></ul>
  10. 10. Packaged as a sharable virtual machine image ENSEMBL Boran Cow Annotated DB MAP FILTER ANALYSIS FILTER ANALYSIS MAP FILTER ANALYSIS Sheko Cow N’Dama Cow Etc …
  11. 11. Highlights new Taverna 2.2 features <ul><li>Officially released last Wednesday – July 7 th 2010 </li></ul><ul><li>Loading and sharing of service sets </li></ul><ul><li>Ability to load and edit workflows that contain services that are offline </li></ul><ul><li>Reporting on the state of the workflow </li></ul><ul><li>Tabular representation of a workflow run </li></ul><ul><li>Retrying and parallelization of service calls </li></ul><ul><li>Consistent representation of the intermediate and workflow results </li></ul><ul><li>Pause/resume/cancel of a running workflow </li></ul><ul><li>Command line tool that allows you to execute workflows outside of the workbench. </li></ul><ul><li>Faster, Better, Easier </li></ul>