# Phylogenetics in R

Talk given on 18 Nov, 2011 on doing phylogenetics in R.

Published in: Technology
### Transcript

• 1. Phylogenetics in R Scott Chamberlain November 18, 2011
• 2. What sorts of phylogenetics things can I do in R?
• 3. The run down
• Get sequence data
• Align sequence data
• Phylogenetic inference
• NJ, maxlik, parsimony, Bayesian, UPGMA
• Visualize phylogenies
• Traits on trees
• Phylogenetic signal
• Trait evolution
• Ancestral state character reconstruction
• Tree simulations
• Get trees
• Phylogenetic community structure
• Bonus stuff: polytomy resolver
• 4. Basic trees in R
• Example
• require(ape)
• tr1 <- read.tree(text = &quot;(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);&quot;)
• tr1 # print tree summary
• write.tree(tr1) # print tree in newick format &quot;(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);&quot;
• tr1\$tip.label # tip labels &quot;B&quot; &quot;C&quot; &quot;D&quot; &quot;A&quot;
• tr1\$edge.length # edge labels 0.04 0.01 0.05 0.05 0.06 0.10
• tr1\$node.label # node labels NULL [MEANING – no node labels]
• # Assign properties to trees
• tr1\$tip.label <- c('sleepy','happy','grumpy','frumpy') # label tips
• tr1\$tip.label # did it work? &quot;sleepy&quot; &quot;happy&quot; &quot;grumpy&quot; &quot;frumpy“
• Etcetera for other tree properties
• 5. Get sequence data
• # install and load ape
• install.packages(&quot;ape&quot;); require(ape)
• # get data from Genbank
• # make vector of accession numbers, for ITS 1 and 2 region for Gossypium (cotton) species
• cotton_acc <- c(&quot;U56806&quot;, &quot;U12712&quot;, &quot;U56810&quot;,
• &quot;U12732&quot;, &quot;U12725&quot;, &quot;U56786&quot;, &quot;U12715&quot;,
• &quot;AF057758&quot;, &quot;U56790&quot;, &quot;U12716&quot;, &quot;U12729&quot;,
• &quot;U56798&quot;, &quot;U12727&quot;, &quot;U12713&quot;, &quot;U12719&quot;,
• &quot;U56811&quot;, &quot;U12728&quot;, &quot;U12730&quot;, &quot;U12731&quot;,
• &quot;U12722&quot;, &quot;U56796&quot;, &quot;U12714&quot;, &quot;U56789&quot;,
• &quot;U56797&quot;, &quot;U56801&quot;, &quot;U56802&quot;, &quot;U12718&quot;,
• &quot;U12710&quot;, &quot;U56804&quot;, &quot;U12734&quot;, &quot;U56809&quot;,
• &quot;U56812&quot;, &quot;AF057753&quot;, &quot;U12711&quot;, &quot;U12717&quot;,
• &quot;U12723&quot;, &quot;U12726&quot;)
• # get data from Genbank
• require(ape)
• cotton <- read.GenBank(cotton_acc, species.names = T)
• # name the sequences with species names instead of access numbers
• names_accs <- data.frame(species = attr(cotton, &quot;species&quot;), accs = names(cotton))
• names(cotton) <- attr(cotton, &quot;species&quot;)
• 6. Align sequence data run external: clustal, mafft
• # multiple sequence alignment
• ### Get clustalw here, and install: http://www.clustal.org/
• # set to your working directory
• setwd(“/path on your computer to/ClustalW2&quot;)
• # write fasta file to directory
• write.dna(cotton, &quot;cotton.fas&quot;, format = &quot;fasta&quot;)
• # run clustal multiple alignment, prints clustal output to console
• system(paste('&quot;./clustalw2&quot; cotton.fas')) # should work on OSX or Windows
• # read the alignment back in to R
• cotton_clustalaligned <- read.dna(&quot;cotton.aln&quot;, format=&quot;clustal&quot;)
 Manual aligment may have to be done, dare I say it, not in R
• 7. Get and align sequences DIY
• Get together with a few other people…or not
• Choose some species to investigate
• Get their accession numbers on GenBank
• If you are really adventurous, also align sequences
• 8. Phylogenetic inference Tools
• R Packages: ape, phangorn, phyclust, phytools, scaleboot
• ape has the most functionality for phylogenetic inference
• You should be able to call MrBayes form R, but I don’t know how – package phyloch?
• 9. Phylogenetic inference
• Fitting evol models: see fxn modelTest in package phangorn
• NJ
• install.packages(“ape&quot;); require(ape)
• data(woodmouse)
• trw <- nj(dist.dna(woodmouse))
• plot(trw)
• Maximum likelihood
• install.packages(&quot;phangorn&quot;); require(phangorn)
• data(Laurasiatherian)
• dm <- dist.logDet(Laurasiatherian)
• njtree <- NJ(dm)
• MLfit <- pml(njtree, Laurasiatherian) # optimize edge length parameter
• MLfit_ <- optim.pml(MLfit, model = &quot;GTR&quot;)
• MLfit_\$tree
• plot(MLfit_\$tree)
• Parsimony
• install.packages(&quot;phangorn&quot;); require(phangorn)
• data(Laurasiatherian)
• dm = dist.logDet(Laurasiatherian)
• tree = NJ(dm)
• treepars <- optim.parsimony(tree, Laurasiatherian)
• 10. Phylogenetic inference---Continued
• Bayesian
• You can do this (maybe) with the package phyloch (get here: http://www.christophheibl.de/Rpackages.html ), by calling MrBayes from R…
• … however, MrBayes is giving way to RevBayes here: http://sourceforge.net/projects/revbayes/ ), fyi
• 11. Phylogenetic inference DIY
• With your partners…or not
• Use the sequence data from GenBank you got earlier
• (if you didn’t align the sequences, don’t worry about it – OR use data set provided with ape or other package)
• Do some phylogenetic inference a couple of different ways (e.g., NJ and parsimony)
• 12. Visualize phylogenies
• R Packages: ape, ade4, phytools, phylobase, ouch, paleoPhylo
• # visualize phylogenies
• install.packages(&quot;ape&quot;)
• require(ape)
• tree <- rcoal(10)
• tree
• plot(tree)
• plot(tree, type = &quot;cladogram&quot;)
• plot(tree, type = &quot;unrooted&quot;)
• plot(tree, type = &quot;radial&quot;)
• plot(tree, type = &quot;fan&quot;)
• 13. Visualize phylogenies DIY
• Get together with a few other people…or not
• Use the tree you made, or use one provided with ape, or other packages
• Do basic plotting, e.g.: plot(mytree)
• Then see if you can
• color the branches,
• label the branches with the edge lengths
• change the tip labels
• etc.
• 14. Traits on trees phylogenetic signal
• R Packages: ape, picante, caper, phytools
• Examples from picante and phytools:
• # phylogenetic signal
• install.packages(&quot;picante&quot;)
• require(picante)
• randtree <- rcoal(20)
• randtraits <- rTraitCont(randtree)
• Kcalc(randtraits[randtree\$tip.label],randtree)
• install.packages(&quot;phytools&quot;)
• require(phytools)
• tree <- rbdtree(1,0,Tmax=4) # make a tree
• x <- fastBM(tree) # simulate traits
• phylosig(tree, x, method=&quot;lambda&quot;, test=TRUE) # calcualte physig, lambda
• phylosig(tree, x, method=&quot;K&quot;, test=TRUE) # calcualte physig, K
• 15. Traits on trees modeling trait evolution
• R Packages: ape, picante, caper, geiger, PHYLOGR, phytools, ade4, motmot
• Above can do: trait evolution of traits, including: discrete and continuous , and with Brownian motion or OU models
• Rbrownie
• Various dev evol modeling frameworks to be included in geiger soon: auteur, mecca, medusa, and fossilmedusa
• here: http://www.webpages.uidaho.edu/~lukeh/software/index.html
• 16. Ancestral state reconstruction
• R Packages: ape, ouch, phytools
• Function ‘ace’ in the ape package works nicely
• But very sensitive to parameters
• Example
• data(bird.orders)
• x <- rnorm(23)
• out <- ace(x, bird.orders)
• out\$ace will have the ancestral character values (which you’ll have to match to nodes of your tree)
• 17. Tree simulations
• R Packages: Treesim, geiger, ape, phybase
• Example
• require(ape)
• tree <- rcoal(10) # Make a random tree
• trait <- rTraitCont(tree, model = &quot;BM&quot;) # Simulate a trait on that tree
• # Write a function to make a tree, simulate a BM trait, and take the mean of that trait
• myfunc <- function(n) {
• tree <- rcoal(n)
• trait <- rTraitCont(tree, model = &quot;BM&quot;)
• mean(trait)
• }
• # do it 100 times and make a data.frame required for ggplot2 plotting
• dat <- replicate(100, myfunc(10))
• dat2 <- data.frame(dat)
• # plot results
• require(ggplot2)
• ggplot(dat2, aes(dat)) + geom_histogram()
• 18. Get trees
• rOpenSci’s treeBASE package
• on CRAN: http://cran.r-project.org/web/packages/treebase/
• install.packages(&quot;treebase&quot;) # install
• require(treebase) # load
• tree <- search_treebase(&quot;Derryberry&quot;, &quot;author&quot;)[[1]] # search
• plot(tree) # plot the tree
• 19. Phylogenetic community structure
• R Packages: picante (includes phylocom functionality)
• --Although, not bladj for some reason, talk to me if you want to run bladj from R
• Example
• Fxn ‘comdistnt’ calculates intercommunity mean nearest taxon index
• data(phylocom)
• comdistnt(phylocom\$sample, cophenetic(phylocom\$phylo), abundance.weighted=FALSE)
• Also, new approach to phycommstruct in R from Matt Helmus, code here:
• http://r-ecology.blogspot.com/2011/10/phylogenetic-community-structure-pglmms.html
• 20. Bonus: Polytomy resolver
• MEE paper: “ A simple polytomy resolver for dated phylogenies”
• by Kuhn, Mooers, and Thomas
• Paper
• http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/abstract
• Supp info has R scripts: http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/suppinfo
• 21. Resources
• Bodega Phylogenetics Wiki:
• Home: http://bodegaphylo.wikispot.org/Front_Page
• BROWNIE tutorial: http://bodegaphylo.wikispot.org/Morphological_Diversification_and_Rates_of_Evolution
• Phylogenetic signal tutorial: http://bodegaphylo.wikispot.org/IV._Testing_Phylogenetic_Signal_in_R
• R phylo-wiki (from NESCent):
• http://www.r-phylo.org/wiki/HowTo/Table_of_Contents
• CRAN task view, Phylogenetics:
• http://cran.r-project.org/web/views/Phylogenetics.html
• rmesquite: https://r-forge.r-project.org/R/?group_id=213
• R-phylogenetics listserve :
• https://stat.ethz.ch/mailman/options/r-sig-phylo/