Talevich bosc2010 bio-phylo

744 views
657 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
744
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Talevich bosc2010 bio-phylo

  1. 1. Bio.Phylo A unified phylogenetics toolkit for Biopython Eric Talevich Institute of Bioinformatics University of Georgia June 29, 2010
  2. 2. Abstract Bio.Phylo is a new phylogenetics library for: • Exploring, modifying and annotating trees • Reading & writing standard file formats • Quick visualization • Gluing together computational pipelines Availability: Biopython 1.54
  3. 3. A quick survey of file formats Newick (a.k.a. New Hampshire) is a simple nested-parens format: (A, (B, C), (D, E)) • Extended & tweaked, led to NHX (and parsing problems) Nexus is a collection of formats, including Newick trees • More than just tree data. . . still tough to parse PhyloXML is an XML-based replacement for NHX • Annotations formalized as XML elements; extensible with user-defined element types NeXML is an XML-based successor to Nexus • Ontology-based — key-value assignments have semantic meaning
  4. 4. Demo: What’s in a tree? 1. Read a simple Newick file 4. Promote to a PhyloXML tree 2. Inspect through IPython 5. Set branch colors 3. Draw with 6. Write a PhyloXML file PyLab/matplotlib
  5. 5. # In a terminal, make a simple Newick file # Then launch the IPython interpreter and read the file % cat > simple.dnd <<EOF > (((A,B),(C,D)),(E,F,G)) > EOF % ipython -pylab >>> from Bio import Phylo >>> tree = Phylo.read(’simple.dnd’, ’newick’)
  6. 6. # String representation shows the object structure >>> print tree Tree(weight=1.0, rooted=False, name=’’) Clade(branch_length=1.0) Clade(branch_length=1.0) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’A’) Clade(branch_length=1.0, name=’B’) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’C’) Clade(branch_length=1.0, name=’D’) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’E’) Clade(branch_length=1.0, name=’F’) Clade(branch_length=1.0, name=’G’)
  7. 7. # Draw an ASCII-art dendrogram >>> Phylo.draw_ascii(tree, column_width=52) ______________ A ______________| | |______________ B ______________| | | ______________ C | |______________| _| |______________ D | | ______________ E | | |______________|______________ F | |______________ G
  8. 8. >>> tree.rooted = True >>> Phylo.draw graphiz(tree) D A C B G E F
  9. 9. # Promote a basic tree to PhyloXML >>> from Bio.Phylo.PhyloXML import Phylogeny >>> phy = Phylogeny.from_tree(tree) >>> print phy Phylogeny(rooted=True, name=’’) Clade(branch_length=1.0) Clade(branch_length=1.0) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’A’) Clade(branch_length=1.0, name=’B’) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’C’) Clade(branch_length=1.0, name=’D’) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’E’) Clade(branch_length=1.0, name=’F’) Clade(branch_length=1.0, name=’G’)
  10. 10. Branch color >>> phy.root.color = (128, 128, 128) Or: >>> phy.root.color = ’#808080’ Or: >>> phy.root.color = ’gray’ Find clades by attribute values: >>> mrca = phy.common ancestor({’name’:’E’}, {’name’:’F’}) >>> mrca.color = ’salmon’ Directly index a clade: >>> phy.clade[0,1].color = ’blue’ >>> Phylo.draw graphviz(phy, prog=’neato’)
  11. 11. D B C A G F E
  12. 12. # Save the color annotations in phyloXML >>> Phylo.write(phy, ’simple-color.xml’, ’phyloxml’) <phy:phyloxml xmlns:phy="http://www.phyloxml.org"> <phylogeny rooted="true"> <clade> <branch_length>1.0</branch_length> <color> <red>128</red> <green>128</green> <blue>128</blue> </color> <clade> <branch_length>1.0</branch_length> <clade> <branch_length>1.0</branch_length> <clade> <name>A</name> ...
  13. 13. Thanks Holla: • Brad Chapman and Christian Zmasek, GSoC 2009 mentors • The Biopython developers, feat. Peter J. A. Cock, Frank Kauff & Cymon J. Cox • Hilmar Lapp & the NESCent Phyloinformatics program • Google’s Open Source Programs Office • My professor, Dr. Natarajan Kannan • Developers like you
  14. 14. Q&A • Which 3rd-party applications should we wrap in Bio.Phylo.Applications? (e.g. RAxML, MrBayes) • Which other libraries should we support interoperability with? (PyCogent, ape) • What other algorithms are simple, stable and relevant? (Consensus, rooting) • Features for systematics? (Geography, PopGen integration?)
  15. 15. Extra: Tree methods >>> dir(tree) collapse get terminals collapse all is bifurcating common ancestor is monophyletic count terminals is parent of depths is preterminal distance ladderize find any prune find clades split find elements total branch length get nonterminals trace get path See: http://biopython.org/DIST/docs/api/Bio.Phylo. BaseTree.TreeMixin-class.html
  16. 16. Extra: The Bio.Phylo class hierarchy Figure: Inheritance relationship among the core classes
  17. 17. Extra: PhyloXML classes $ pydoc Bio.Phylo.PhyloXML Accession Date Point Alphabet Distribution Polygon Annotation DomainArchitecture Property BaseTree Events ProteinDomain BinaryCharacters Id Reference BranchColor MolSeq Sequence Clade Other SequenceRelation CladeRelation Phylogeny Taxonomy Confidence Phyloxml Uri See: http://biopython.org/wiki/PhyloXML

×