Phylotastic reconciliation
Upcoming SlideShare
Loading in...5

Phylotastic reconciliation



Author: Jamie Estill

Author: Jamie Estill



Total Views
Views on SlideShare
Embed Views



1 Embed 10 10



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • This iPlant Sponsored Tree Reconciliation Working group is one of six main working groups that are part of the iPlant Tree of Life program. The overall goals of iPToL project are to develop the cyberinfrastructure needed to assemble, visualize and analyze the plant tree of life. The goals of the Tree Reconciliation Working group include the development of database tools for 'post-tree' analysis of the reconciliation of gene trees to species trees. This is post-tree in the sense that the species tree is taken as a given that will result from work being developed by the Big Trees group.
  • Gene tree reconciliations allow us to map processes and events from the gene tree onto the species tree. These include: *gene duplications *gene losses *lineage sorting *horizontal transfer
  • The utility of gene tree reconciliation … Ancestral polyploidy events are a major component of plant genome evolution.
  • Existing tools for gene tree reconcliation include: *Software to generate reconciliations (TreeBeST, primeGSR) *Software to visualize these reconciliations (primeTV/fltreebest) *Databases such as En semble Compara that allow us to store reconciled gene trees as well as information regarding the sequences, alignments and locations of the genes comprising the reconciled gene families
  • Our initial goals in extending cyberinfrastructure for gene tree reconciliation involved developing a static database of precomputed reconciliations.
  • We extended the Ensemble Compare database design to include precomputed species trees, precomputed gene trees and a reconciliation mapping between the two. We have also added support for ontologies to tag attributes of trees, nodes, functional gene annotation and developed a Tree We have high-throughput pipelines for TreeBEST, primeGSR and NOTUNG to generate large numbers of reconciliations and load these to the database. We can also populate functional annotation of genes using input from the annot8r functional annotation program. We also have developed a new interface for visualizing reconciled trees. This interface allow for visualizing reconciled trees stored in the database as well as supports queries to find reconciled trees within the database.
  • The GUI allows for a simultaneously viewing the species tree and a gene tree reconciled to that species tree. These trees “interact” such that selecting branches in one tree can highlight nodes and edges in the other.
  • The gene tree node color highlight the location of duplication and speciation events ..
  • .. the species tree maps the location of duplication events from the gene tree onto the species tree. Duplication events are shown here as green triangles.
  • The GUI also provides a way to find reconciled gene families within the database …. Queries for: BLAST Can search for gene families in the database that match a DNA or protein sequence query. GO Term Can search for gene families that have been annotated for a specific GO term. Locus Name It is possible to identify the gene families that contain a known locus name. Gene Family Name It is also possible to jump directly to a gene family name.
  • Having reconciliations mapped to a database that can be queried like this is awesome, and allows us to ask new questions,
  • Having reconciliations mapped to a database that can be queried like this is awesome, and allows us to ask new questions,
  • A difficulty here is determining the species source of the gene given the gene information. The third component, shown here as NEXML encoding would depend in part on the standards used by phylotastic for communication among the components of the phylotastic workflows. See Daniel Packer’s GSOC Project for notes on NEXML encoding.
  • The DNA subway is an AWESOME education tool that takes users through the process of genome annotation. Starting with genome sequence data (such a sequenced BAC), students find the genes and can even generate gene trees using their annotated gene as a query sequence for an automated generation of a gene tree. The ‘Prospect Genome’ track current dead ends with this gene tree. Given a system that could accept that gene tree as input for reconciliation it would be possible to generate a reconciled gene tree that would provide an awesome way to introduce students to the concepts of orthology and paralogy using data that they have generated themselves starting with raw genome sequence. In this case the initial input is unannotated genome sequence .. so it would be possible to go from raw genome sequence data to reconciled gene trees using an intuitive interface that is simple enough to use in undergraduate education. This is awesome because this could be student generated sequence data that has never been annotated before, and the pipeline could result in a set of student derived reconciled gene trees.

Phylotastic reconciliation Phylotastic reconciliation Presentation Transcript

  • Gene Tree/Species TreeReconciliationPhylotastic Hackathon June 4, 2012
  • iPlant Tree of Life (iPTOL)• Tree Reconciliation• Big Trees• Data Assembly• Trait Evolution• Data Integration• Tree Visualization
  • Gene Tree ReconciliationProjection of gene trees onto a species tree• gene duplications• gene losses• lineage sorting• horizontal transfer
  • Gene Tree Reconciliation• Locating gene duplications allows us to identify orthologs and paralogs• Identify gene composition in inferred ancestral genomes• Map of the positions of ancestral polyploidy events• Contribute to the study of the “fate” of duplicated genes• Address questions of gene family coevolution
  • Existing TR Cyberinfrastructure Generate EC VisualizeReconciliations Gene Reconciliations Trees TreeBeST primeTV primeGSR fltreebest
  • Extending TR Cyberinfrastructure• Increased interoperability among the component pieces• Query the location of gene duplications on the species tree• Integrate tree visualization tools that scale to many thousands of nodes• Allow for the storage and analysis of multiple reconciliations for a single gene tree within a single database structure
  • Extending TR Cyberinfrastructure Generate VisualizeReconciliations Gene Reconciliations Trees TreeBeST primeTV Reconciled primeGSR fltreebest NOTUNG Species Trees annot8r Ontology Functional Annotation
  • Tree Reconciliation GUI
  • Tree Reconciliation GUI
  • Tree Reconciliation GUI
  • Tree Reconciliation GUI Queries • BLAST • GO Term • Locus Name • Gene Family Name
  • Current Limitations• Users query against a pre-computed set of reconciliations • We generate the species trees • We generate the gene trees given alignments • We generate reconciliation mappings• Reconciliation visualization is currently tied to the database• Users can NOT submit their own data (genes trees or alignments) for reconciliation
  • Making TR Phylotastic• Allow users to generate reconciliations using their own data • Supply a species tree OR • Supply an gene family alignment
  • Phylotastic Components• Name resolution • Given a gene tree or alignments determine the species list• Tree Pruner • Given the species list above, generate the species tree required for reconciliation• NEXML encoding • Return reconciled tree using NEXML
  • A Phylotastic DNA Subway ..