Your SlideShare is downloading. ×
iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

445
views

Published on

The reconciliation of gene trees to species trees makes use of the species tree to infer the history of evolutionary events such as gene duplication and loss in an individual gene family history. A …

The reconciliation of gene trees to species trees makes use of the species tree to infer the history of evolutionary events such as gene duplication and loss in an individual gene family history. A cyberinfrastructure for tree reconciliation (TR) has been developed that includes an extensible pipeline for high-throughput reconciliation of gene trees to species trees, database utilities, and a visualization tool. The TR database schema extends the Ensemble-Compara database to include species trees and the mapping between the nodes of a gene tree and the species tree used for that reconciliation, which permits large-scale analysis of the distribution of gene tree events on species tree, and comparison of the evolutionary timing of events between gene trees. The Chado controlled vocabulary module was also incorporated to support the use of OBO ontologies to tag attribute values within the database. The schema supports multiple reconciliations for each gene tree, and an ontology for TR was developed to support storage of metadata for TR methodologies. Additions to the BioPerl Tree API allow for direct import of reconciled trees in PRIME format, and utilities have been provided to populate the database from de novo analyses of gene tree reconciliations. Queries against the database are facilitated by a RESTful web API that allows for BLAST searches against gene sequences in the database, as well as searches for GO term assignments among gene families. These tools support comparative analysis of reconciliation methodologies, which we illustrate by reporting an evaluation of the accuracy of methods that reconcile gene trees individually relative to synteny-informed reconstructions of genome duplication history. We also illustrate a novel visualization tool for interactively exploring the mapping between gene trees and species tree.

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
445
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • I am Jamie Estill, a Post Doc with Jim Leebens-Mack's lab at the university of Georiga. Today I am going to talk work that we have done over the past year in collaboration with the group of people you see here that make up the iPlant Tree Reconciliation Working Group.
  • This iPlant Sponsored Tree Reconciliation Working group is one of six main working groups that are part of the iPlant Tree of Life program. The overall goals of iPToL project are to develop the cyberinfrastructure needed to assemble, visualize and analyze the plant tree of life. The goals of the Tree Reconciliation Working group include the development of database tools for 'post-tree' analysis of the reconciliation of gene trees to species trees.
  • Gene tree reconciliations allow us to map processes and events from the gene tree onto the species tree. These include: *gene duplications *gene losses *lineage sorting *horizontal transfer
  • The utility of gene tree reconciliation …
  • Existing tools for gene tree reconcliation include: *Software to generate reconciliations (TreeBeST, primeGSR) *Software to visualize these reconciliations (primeTV/fltreebest) *Databases such as En semble Compara that allow us to store reconciled gene trees as well as information regarding the sequences, alignments and locations of the genes comprising the reconciled gene families
  • Our initial goals in extending cyberinfrastructure for gene tree reconciliation
  • A first step to extending TR Cyberinfrastructure is to include species trees in the database. This establishes the framework to do direct queries on species trees, and their topologies.
  • Species trees exist in the database as ;a set of nodes and edges. These trees i ndexed as nested sets using left and right index values that allow for for fast queries on tree topology.
  • Once species trees and gene trees are both represented in the schema, it is then possible to represent the reconciliation of gene trees to species trees as the mapping of the nodes of the gene trees onto the nodes and edges of the species tree.
  • The basic concept is to map the nodes of the gene tree onto the nodes and edges of the species tree.
  • In actual use, host node ids for the host tree would be integers from a host tree stored in the species tree table.
  • The reconciliation in the database is therefore a mapping of the nodes of the guest tree onto the nodes and edges of the species tree, while the topology of the two trees are stored independently from the mapping of the reconciliation itself. This allows us to represent a number of basic events and processes …
  • Speciation events …
  • Gene duplication are nodes from the gene tree that map onto a edge of the species tree.
  • HT events can map from one node to another node in the host tree …
  • HT events could also from a node in the species tree to an edge ..
  • Since the topology of the gene tree is stored separately of the topology of the species tree, it is possible to source an incoming reconciled node completely outside of the scope of the current species tree. There do exist reasons to support this. For example, the source species for an HGT may not be included in the species tree – e.g. transfer of cyanobacterial genes to an angiosperm where the species tree is limited to angiosperms.
  • This also allows for the mapping of nodes in the gene tree that are beyond the LCA of the species tree.
  • Additional meta-data are also required to tag the nodes in the reconcilaition. For example a gene tree node mapped to an edge in the species tree could be a duplication event or a horizontal transfer event. An on tology for tree reconciliation (TRON) has been developed to support this tagging, and Ontology support has been added to the database by including the controlled vocabulary tables from Chado.
  • This allows us to richly tag nodes in the reconciliation mapping …
  • For example, can map nodes in the reconciliation as being a speciation or duplication node.
  • Since the tree reconciliation database now supports ontologies, it is possible to annotate gene function of the individual genes that incorporate controlled vocabularies for gene function.
  • We have developed pipelines that allow for high-throughput reconclation of gene families, and importing these reconciled results into the database.
  • We also have developed a new interface for visualizing reconciled trees. This interface allow for visualizing reconciled trees stored in the database as well as supports queries to find reconciled trees within the database.
  • The GUI allows for a simultaneously viewing the species tree and a gene tree reconciled to that species tree. These trees “interact” such that selecting branches in one tree can highlight nodes and edges in the other.
  • The gene tree node color highlight the location of duplication and speciation events ..
  • .. the species tree maps the location of duplication events from the gene tree onto the species tree. Duplication events are shown here as green triangles.
  • The species tree also serves as a method for searching for duplication events on branches. Selecting a “triangle” on a branch brings up a table listing the gene families that have a duplication event on that branch. This table allows a point to access the word cloud for the GO terms associated with the collection of genes within the gene familiy.
  • The GUI also provides a way to find families within the database …. Queries for: BLAST Can search for gene families in the database that match a DNA or protein sequence query. GO Term Can search for gene families that have been annotated for a specific GO term. Locus Name It is possible to identify the gene families that contain a known locus name. Gene Family Name It is also possible to jump directly to a gene family name.
  • We are also developing some command line code that allows for high throughput queries. For example, iterating across all branches in a species tree and counting the number of reconciliations for those branches.
  • Current development …
  • Transcript

    • 1. Extending Cyberinfrastructure for Gene Tree Reconciliation James Estill, John Bowers, Hariolf Haefele, Adam Kubach, Naim Matasci, Sheldon McKay, Andrew Muir, Dennis Roberts, Sriram Srinivasan, Cécile Ané, Jim Leebens-Mack, Todd Vision iEvoBio June 21, 2011
    • 2.
      • iPlant Tree of Life (iPTOL)
      • Tree Reconciliation
      • Big Trees
      • Data Assembly
      • Trait Evolution
      • Data Integration
      • Tree Visualization
    • 3. Gene Tree Reconciliation
      • Projection of gene trees onto a species tree
      • gene duplications
      • gene losses
      • lineage sorting
      • horizontal transfer
    • 4. Gene Tree Reconciliation
      • Locating gene duplications allows us to identify orthologs and paralogs
      • Identify gene composition in inferred ancestral genomes
      • Map of the positions of ancestral polyploidy events
      • Contribute to the study of the “fate” of duplicated genes
      • Address questions of gene family coevolution
    • 5. Existing Cyberinfrastructure TreeBeST primeGSR Generate Reconciliations primeTV fltreebest Gene Trees EC Visualize Reconciliations
    • 6. Extending Cyberinfrastructure
      • Increased interoperability among the component pieces
      • Query the location of gene duplications on the species tree
      • Integrate tree visualization tools that scale to many thousands of nodes
      • Allow for the storage and analysis of multiple reconciliations for a single gene tree within a single database structure
    • 7. Extending Cyberinfrastructure Species Trees TreeBeST primeGSR Generate Reconciliations primeTV fltreebest Gene Trees Visualize Reconciliations
    • 8. Adding Species Trees
    • 9. Extending Cyberinfrastructure Species Trees Reconciled TreeBeST primeGSR Generate Reconciliations primeTV fltreebest Gene Trees Visualize Reconciliations
    • 10. A B C D E 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Reconciled Tree Gene Tree 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 12 Genes From 3 Species 23 Nodes in Gene Tree Species Tree E D A B C 3 Species 5 Nodes
    • 11. Mapping Host to Guest
      • Map the guest tree onto the host tree by defining the position on an host tree edge that the gene tree node maps to
      A 1 2 Host Tree Edge B Host Tree Nodes Guest Tree Edge Guest Tree Nodes Parent Node Child Node
    • 12. Mapping Host to Guest
      • Guest nodes can map to four general locations on host edges
      1 Inside Parent Node 2 Inside Child Node Edge Between Host Nodes 3 Outside of Host Edge 4
    • 13. Mapping Host to Guest
      • Locations stored in a reconciliation map table
      1 2 3 4 A A A A B B B B map id guest node host parent node host child node 1001 1 A A 1002 2 B B 1003 3 A B 1004 4 NULL A
    • 14. Reconciliation
      • Reconciliation is a mapping of the nodes of guest tree (gene tree) onto the nodes and edges of the host tree (species tree)
      • The topology of the two trees are stored separately from the mapping of the reconciliation itself
    • 15. Speciation A B C 1 2 3 D 4 map_id guest host_parent host_child 301 1 A A 302 2 B B 303 3 C C 304 4 D D
    • 16. Gene Duplication A B 1 2 3 4 map_id guest host_parent host_child 101 1 A A 102 2 A B 103 3 B B 104 4 B B
    • 17. Horizontal Transfer to Node A B 3 B C D 1 2 3 map_id guest host_parent host_child 201 1 A A 202 2 B B 203 3 D D
    • 18. Horizontal Transfer to Edge A B 3 B C D 1 2 3 map_id guest host_parent host_child 201 1 A A 202 2 B B 203 3 C D
    • 19. Alien Horizontal Transfer C D 1 3 Gene Source Outside of Species Tree 2 map_id guest host_parent host_child 201 1 NULL NULL 202 2 NULL NULL 203 3 C D
    • 20. Gene Nodes Beyond Species LCA A C 1 2 3 D 4 5 6 7 map_id guest host_parent host_child 301 1 NULL A 302 2 A A 303 3 A A 304 4 C C 305 5 C C 306 6 D D 307 7 D D
    • 21. Extending Cyberinfrastructure Ontology Species Trees Reconciled TreeBeST primeGSR Generate Reconciliations primeTV fltreebest Gene Trees Visualize Reconciliations TRON
    • 22. Reconciled Node Attributes A B C D E 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Reconciliation node attributes stored in a separate table map_id node_id host_parent host_child 1001 1 A A 1002 2 A A 1003 3 A A 1004 4 B B 1005 5 B B 1006 6 B B 1007 7 B B 1008 8 C C 1009 9 C C 1010 10 C C 1011 11 C C 1012 12 C C 1013 13 D B 1014 14 D B 1015 15 E C 1016 16 D D 1017 17 D D 1018 18 E C 1019 19 E D 1020 20 E C 1021 21 E D 1022 22 NULL E 1023 23 NULL E
    • 23. A B D 1 2 3 4 5 6 7 8 13 14 16 17 Reconciliation Map Table Reconciliation Node Attributes This could hold type of node, distance to host child node etc. map_id node_id host_parent host_child 1001 1 A A 1002 2 A A 1003 3 A A 1004 4 B B 1005 5 B B 1006 6 B B 1007 7 B B 1008 8 C C 1009 9 C C 1010 10 C C 1011 11 C C 1012 12 C C 1013 13 D B 1014 14 D B 1015 15 E C 1016 16 D D 1017 17 D D 1018 18 E C 1019 19 E D 1020 20 E C 1021 21 E D 1022 22 NULL E 1023 23 NULL E map_id term value 1001 1 leaf 1002 1 leaf 1003 1 leaf 1013 1 duplication 1014 1 duplication 1016 1 speciation 1017 1 speciation
    • 24. Extending Cyberinfrastructure Ontology Species Trees Reconciled TreeBeST primeGSR Generate Reconciliations primeTV fltreebest Gene Trees Visualize Reconciliations annot8r Functional Annotation TRON
    • 25. Gene Ontology Terms
      • Gene Ontology terms are assigned to individual gene models using annot8r
      • Additional tools are provided to import annot8r output into the database
    • 26. Extending Cyberinfrastructure Ontology Species Trees Reconciled TreeBeST primeGSR Generate Reconciliations primeTV fltreebest Gene Trees Visualize Reconciliations annot8r Functional Annotation TRON
    • 27. Populating the Database
      • Currently support TreeBeST reconciliations in the analysis pipeline
      • Added PRIME support to BioPerl TREE::IO to support import of TreeBeST output into the database
      ((AT1G79430_Arabidopsis [&&PRIME ID=13 S=Arabidopsis AC=(6 8 9 10)],((CPS0032G081_papaya [&&PRIME ID=10 S=papaya AC=(7 8 9)],V19G1171_grape [&&PRIME ID=9 S=grape AC=(0)])snode0 [&&PRIME ID=11 D=0 AC=(10)],
    • 28. Extending Cyberinfrastructure Ontology Species Trees Reconciled TreeBeST primeGSR Generate Reconciliations primeTV fltreebest Gene Trees Visualize Reconciliations annot8r Functional Annotation Query Functions TRON
    • 29. Tree Reconciliation GUI
    • 30. Tree Reconciliation GUI
    • 31. Tree Reconciliation GUI
    • 32. Tree Reconciliation GUI
    • 33. Tree Reconciliation GUI
      • Queries
      • BLAST
      • GO Term
      • Locus Name
      • Gene Family Name
    • 34. Extending Cyberinfrastructure Ontology Species Trees Reconciled TreeBeST primeGSR Generate Reconciliations primeTV fltreebest Gene Trees Visualize Reconciliations annot8r Functional Annotation Query Functions TRON >high_throughput.pl
    • 35. Current Development
      • Extending analysis pipeline to additional reconciliation software (primeGSR/Notung)
      • Evaluating accuracy of reconciliation software compared to synteny informed reconstruction
      • XML Representation of reconciled trees
      • Further refinement of GUI and integration with iPlant Discovery Environment
    • 36.
      • Repository
      • http://tinyurl.com/iPlantOS
      • Iplant-treerec – Back end services
      • Tr-standalone – TR Viewer
      Availability Documentation http: //tinyurl . com/TRDocs