Overview and demo of a new Biopython package for phylogenetics -- drawing and annotating evolutionary trees. Presented at the Bioinformatics Open Source Conference (BOSC) 2010.
1. Bio.Phylo
A unified phylogenetics toolkit for Biopython
Eric Talevich
Institute of Bioinformatics
University of Georgia
June 29, 2010
2. Abstract
Bio.Phylo is a new phylogenetics library for:
• Exploring, modifying and annotating trees
• Reading & writing standard file formats
• Quick visualization
• Gluing together computational pipelines
Availability: Biopython 1.54
3. A quick survey of file formats
Newick (a.k.a. New Hampshire) is a simple nested-parens
format: (A, (B, C), (D, E))
• Extended & tweaked, led to NHX (and parsing
problems)
Nexus is a collection of formats, including Newick trees
• More than just tree data. . . still tough to parse
PhyloXML is an XML-based replacement for NHX
• Annotations formalized as XML elements;
extensible with user-defined element types
NeXML is an XML-based successor to Nexus
• Ontology-based — key-value assignments have
semantic meaning
4. Demo: What’s in a tree?
1. Read a simple Newick file
4. Promote to a PhyloXML tree
2. Inspect through IPython
5. Set branch colors
3. Draw with
6. Write a PhyloXML file
PyLab/matplotlib
5. # In a terminal, make a simple Newick file
# Then launch the IPython interpreter and read the file
% cat > simple.dnd <<EOF
> (((A,B),(C,D)),(E,F,G))
> EOF
% ipython -pylab
>>> from Bio import Phylo
>>> tree = Phylo.read(’simple.dnd’, ’newick’)
7. # Draw an ASCII-art dendrogram
>>> Phylo.draw_ascii(tree, column_width=52)
______________ A
______________|
| |______________ B
______________|
| | ______________ C
| |______________|
_| |______________ D
|
| ______________ E
| |
|______________|______________ F
|
|______________ G
12. # Save the color annotations in phyloXML
>>> Phylo.write(phy, ’simple-color.xml’, ’phyloxml’)
<phy:phyloxml xmlns:phy="http://www.phyloxml.org">
<phylogeny rooted="true">
<clade>
<branch_length>1.0</branch_length>
<color>
<red>128</red>
<green>128</green>
<blue>128</blue>
</color>
<clade>
<branch_length>1.0</branch_length>
<clade>
<branch_length>1.0</branch_length>
<clade>
<name>A</name>
...
13. Thanks
Holla:
• Brad Chapman and Christian Zmasek, GSoC 2009 mentors
• The Biopython developers, feat. Peter J. A. Cock,
Frank Kauff & Cymon J. Cox
• Hilmar Lapp & the NESCent Phyloinformatics program
• Google’s Open Source Programs Office
• My professor, Dr. Natarajan Kannan
• Developers like you
14. Q&A
• Which 3rd-party applications should we wrap in
Bio.Phylo.Applications? (e.g. RAxML, MrBayes)
• Which other libraries should we support interoperability with?
(PyCogent, ape)
• What other algorithms are simple, stable and relevant?
(Consensus, rooting)
• Features for systematics? (Geography, PopGen integration?)
15. Extra: Tree methods
>>> dir(tree)
collapse get terminals
collapse all is bifurcating
common ancestor is monophyletic
count terminals is parent of
depths is preterminal
distance ladderize
find any prune
find clades split
find elements total branch length
get nonterminals trace
get path
See: http://biopython.org/DIST/docs/api/Bio.Phylo.
BaseTree.TreeMixin-class.html
16. Extra: The Bio.Phylo class hierarchy
Figure: Inheritance relationship among the core classes
17. Extra: PhyloXML classes
$ pydoc Bio.Phylo.PhyloXML
Accession Date Point
Alphabet Distribution Polygon
Annotation DomainArchitecture Property
BaseTree Events ProteinDomain
BinaryCharacters Id Reference
BranchColor MolSeq Sequence
Clade Other SequenceRelation
CladeRelation Phylogeny Taxonomy
Confidence Phyloxml Uri
See: http://biopython.org/wiki/PhyloXML