Phylogenetics in R Scott Chamberlain November 18, 2011
What sorts of phylogenetics things can I do in R?
The run down <ul><li>Get sequence data </li></ul><ul><li>Align sequence data </li></ul><ul><li>Phylogenetic inference </li...
Basic trees in R <ul><li>Example </li></ul><ul><li>require(ape) </li></ul><ul><li>tr1 <- read.tree(text = &quot;(((B:0.05,...
Get sequence data <ul><li># install and load ape </li></ul><ul><li>install.packages(&quot;ape&quot;); require(ape) </li></...
Align sequence data run external: clustal, mafft <ul><li># multiple sequence alignment </li></ul><ul><li>###  Get clustalw...
Get and align sequences DIY <ul><li>Get together with a few other people…or not </li></ul><ul><ul><li>Choose some species ...
Phylogenetic inference  Tools <ul><li>R Packages: ape, phangorn, phyclust, phytools, scaleboot </li></ul><ul><li>ape  has ...
Phylogenetic inference  <ul><li>Fitting evol models:  see fxn  modelTest  in package phangorn </li></ul><ul><li>NJ </li></...
Phylogenetic inference---Continued <ul><li>Bayesian </li></ul><ul><ul><li>You can do this (maybe) with the package phyloch...
Phylogenetic inference DIY <ul><li>With your partners…or not </li></ul><ul><ul><li>Use the sequence data from GenBank you ...
Visualize phylogenies <ul><li>R Packages: ape, ade4, phytools, phylobase, ouch, paleoPhylo </li></ul><ul><li># visualize p...
Visualize phylogenies DIY <ul><li>Get together with a few other people…or not </li></ul><ul><ul><li>Use the tree you made,...
Traits on trees phylogenetic signal <ul><li>R Packages: ape, picante, caper, phytools </li></ul><ul><li>Examples from pica...
Traits on trees modeling trait evolution <ul><li>R Packages: ape, picante, caper, geiger, PHYLOGR, phytools, ade4, motmot ...
Ancestral state reconstruction <ul><li>R Packages: ape, ouch, phytools </li></ul><ul><li>Function ‘ace’ in the ape package...
Tree simulations <ul><li>R Packages: Treesim, geiger, ape, phybase </li></ul><ul><li>Example </li></ul><ul><li>require(ape...
Get trees <ul><li>rOpenSci’s treeBASE package </li></ul><ul><li>on CRAN:  http://cran.r-project.org/web/packages/treebase/...
Phylogenetic community structure <ul><li>R Packages: picante  (includes phylocom functionality) </li></ul><ul><li>--Althou...
Bonus: Polytomy resolver <ul><li>MEE paper: “ A simple polytomy resolver for dated phylogenies”  </li></ul><ul><li>by Kuhn...
Resources <ul><li>Bodega Phylogenetics Wiki:  </li></ul><ul><ul><li>Home:  http://bodegaphylo.wikispot.org/Front_Page   </...
Upcoming SlideShare
Loading in...5
×

Phylogenetics in R

15,891

Published on

Talk given on 18 Nov, 2011 on doing phylogenetics in R.

Published in: Technology

Phylogenetics in R

  1. 1. Phylogenetics in R Scott Chamberlain November 18, 2011
  2. 2. What sorts of phylogenetics things can I do in R?
  3. 3. The run down <ul><li>Get sequence data </li></ul><ul><li>Align sequence data </li></ul><ul><li>Phylogenetic inference </li></ul><ul><ul><li>NJ, maxlik, parsimony, Bayesian, UPGMA </li></ul></ul><ul><li>Visualize phylogenies </li></ul><ul><li>Traits on trees </li></ul><ul><ul><li>Phylogenetic signal </li></ul></ul><ul><ul><li>Trait evolution </li></ul></ul><ul><ul><li>Ancestral state character reconstruction </li></ul></ul><ul><li>Tree simulations </li></ul><ul><li>Get trees </li></ul><ul><li>Phylogenetic community structure </li></ul><ul><li>Bonus stuff: polytomy resolver </li></ul>
  4. 4. Basic trees in R <ul><li>Example </li></ul><ul><li>require(ape) </li></ul><ul><li>tr1 <- read.tree(text = &quot;(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);&quot;) </li></ul><ul><li>tr1 # print tree summary </li></ul><ul><li>write.tree(tr1) # print tree in newick format &quot;(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);&quot; </li></ul><ul><li>tr1$tip.label # tip labels &quot;B&quot; &quot;C&quot; &quot;D&quot; &quot;A&quot; </li></ul><ul><li>tr1$edge.length # edge labels 0.04 0.01 0.05 0.05 0.06 0.10 </li></ul><ul><li>tr1$node.label # node labels NULL [MEANING – no node labels] </li></ul><ul><li># Assign properties to trees </li></ul><ul><li>tr1$tip.label <- c('sleepy','happy','grumpy','frumpy') # label tips </li></ul><ul><li>tr1$tip.label # did it work? &quot;sleepy&quot; &quot;happy&quot; &quot;grumpy&quot; &quot;frumpy“ </li></ul><ul><li>Etcetera for other tree properties </li></ul>
  5. 5. Get sequence data <ul><li># install and load ape </li></ul><ul><li>install.packages(&quot;ape&quot;); require(ape) </li></ul><ul><li># get data from Genbank </li></ul><ul><li># make vector of accession numbers, for ITS 1 and 2 region for Gossypium (cotton) species </li></ul><ul><li>cotton_acc <- c(&quot;U56806&quot;, &quot;U12712&quot;, &quot;U56810&quot;, </li></ul><ul><li>&quot;U12732&quot;, &quot;U12725&quot;, &quot;U56786&quot;, &quot;U12715&quot;, </li></ul><ul><li>&quot;AF057758&quot;, &quot;U56790&quot;, &quot;U12716&quot;, &quot;U12729&quot;, </li></ul><ul><li>&quot;U56798&quot;, &quot;U12727&quot;, &quot;U12713&quot;, &quot;U12719&quot;, </li></ul><ul><li>&quot;U56811&quot;, &quot;U12728&quot;, &quot;U12730&quot;, &quot;U12731&quot;, </li></ul><ul><li>&quot;U12722&quot;, &quot;U56796&quot;, &quot;U12714&quot;, &quot;U56789&quot;, </li></ul><ul><li>&quot;U56797&quot;, &quot;U56801&quot;, &quot;U56802&quot;, &quot;U12718&quot;, </li></ul><ul><li>&quot;U12710&quot;, &quot;U56804&quot;, &quot;U12734&quot;, &quot;U56809&quot;, </li></ul><ul><li>&quot;U56812&quot;, &quot;AF057753&quot;, &quot;U12711&quot;, &quot;U12717&quot;, </li></ul><ul><li>&quot;U12723&quot;, &quot;U12726&quot;) </li></ul><ul><li># get data from Genbank </li></ul><ul><li>require(ape) </li></ul><ul><li>cotton <- read.GenBank(cotton_acc, species.names = T) </li></ul><ul><li># name the sequences with species names instead of access numbers </li></ul><ul><li>names_accs <- data.frame(species = attr(cotton, &quot;species&quot;), accs = names(cotton)) </li></ul><ul><li>names(cotton) <- attr(cotton, &quot;species&quot;) </li></ul>
  6. 6. Align sequence data run external: clustal, mafft <ul><li># multiple sequence alignment </li></ul><ul><li>### Get clustalw here, and install: http://www.clustal.org/ </li></ul><ul><li># set to your working directory </li></ul><ul><li>setwd(“/path on your computer to/ClustalW2&quot;) </li></ul><ul><li># write fasta file to directory </li></ul><ul><li>write.dna(cotton, &quot;cotton.fas&quot;, format = &quot;fasta&quot;) </li></ul><ul><li># run clustal multiple alignment, prints clustal output to console </li></ul><ul><li>system(paste('&quot;./clustalw2&quot; cotton.fas')) # should work on OSX or Windows </li></ul><ul><li># read the alignment back in to R </li></ul><ul><li>cotton_clustalaligned <- read.dna(&quot;cotton.aln&quot;, format=&quot;clustal&quot;) </li></ul> Manual aligment may have to be done, dare I say it, not in R
  7. 7. Get and align sequences DIY <ul><li>Get together with a few other people…or not </li></ul><ul><ul><li>Choose some species to investigate </li></ul></ul><ul><ul><li>Get their accession numbers on GenBank </li></ul></ul><ul><ul><li>Download sequence data from Genbank </li></ul></ul><ul><ul><li>If you are really adventurous, also align sequences </li></ul></ul>
  8. 8. Phylogenetic inference Tools <ul><li>R Packages: ape, phangorn, phyclust, phytools, scaleboot </li></ul><ul><li>ape has the most functionality for phylogenetic inference </li></ul><ul><li>You should be able to call MrBayes form R, but I don’t know how – package phyloch? </li></ul>
  9. 9. Phylogenetic inference <ul><li>Fitting evol models: see fxn modelTest in package phangorn </li></ul><ul><li>NJ </li></ul><ul><ul><li>install.packages(“ape&quot;); require(ape) </li></ul></ul><ul><ul><li>data(woodmouse) </li></ul></ul><ul><ul><li>trw <- nj(dist.dna(woodmouse)) </li></ul></ul><ul><ul><li>plot(trw) </li></ul></ul><ul><li>Maximum likelihood </li></ul><ul><ul><li>install.packages(&quot;phangorn&quot;); require(phangorn) </li></ul></ul><ul><ul><li>data(Laurasiatherian) </li></ul></ul><ul><ul><li>dm <- dist.logDet(Laurasiatherian) </li></ul></ul><ul><ul><li>njtree <- NJ(dm) </li></ul></ul><ul><ul><li>MLfit <- pml(njtree, Laurasiatherian) # optimize edge length parameter </li></ul></ul><ul><ul><li>MLfit_ <- optim.pml(MLfit, model = &quot;GTR&quot;) </li></ul></ul><ul><ul><li>MLfit_$tree </li></ul></ul><ul><ul><li>plot(MLfit_$tree) </li></ul></ul><ul><li>Parsimony </li></ul><ul><ul><li>install.packages(&quot;phangorn&quot;); require(phangorn) </li></ul></ul><ul><ul><li>data(Laurasiatherian) </li></ul></ul><ul><ul><li>dm = dist.logDet(Laurasiatherian) </li></ul></ul><ul><ul><li>tree = NJ(dm) </li></ul></ul><ul><ul><li>treepars <- optim.parsimony(tree, Laurasiatherian) </li></ul></ul>
  10. 10. Phylogenetic inference---Continued <ul><li>Bayesian </li></ul><ul><ul><li>You can do this (maybe) with the package phyloch (get here: http://www.christophheibl.de/Rpackages.html ), by calling MrBayes from R… </li></ul></ul><ul><ul><li>… however, MrBayes is giving way to RevBayes here: http://sourceforge.net/projects/revbayes/ ), fyi </li></ul></ul>
  11. 11. Phylogenetic inference DIY <ul><li>With your partners…or not </li></ul><ul><ul><li>Use the sequence data from GenBank you got earlier </li></ul></ul><ul><ul><li>(if you didn’t align the sequences, don’t worry about it – OR use data set provided with ape or other package) </li></ul></ul><ul><ul><li>Do some phylogenetic inference a couple of different ways (e.g., NJ and parsimony) </li></ul></ul>
  12. 12. Visualize phylogenies <ul><li>R Packages: ape, ade4, phytools, phylobase, ouch, paleoPhylo </li></ul><ul><li># visualize phylogenies </li></ul><ul><li>install.packages(&quot;ape&quot;) </li></ul><ul><li>require(ape) </li></ul><ul><li>tree <- rcoal(10) </li></ul><ul><li>tree </li></ul><ul><li>plot(tree) </li></ul><ul><li>plot(tree, type = &quot;cladogram&quot;) </li></ul><ul><li>plot(tree, type = &quot;unrooted&quot;) </li></ul><ul><li>plot(tree, type = &quot;radial&quot;) </li></ul><ul><li>plot(tree, type = &quot;fan&quot;) </li></ul>
  13. 13. Visualize phylogenies DIY <ul><li>Get together with a few other people…or not </li></ul><ul><ul><li>Use the tree you made, or use one provided with ape, or other packages </li></ul></ul><ul><ul><li>Do basic plotting, e.g.: plot(mytree) </li></ul></ul><ul><ul><li>Then see if you can </li></ul></ul><ul><ul><ul><li>color the branches, </li></ul></ul></ul><ul><ul><ul><li>label the branches with the edge lengths </li></ul></ul></ul><ul><ul><ul><li>change the tip labels </li></ul></ul></ul><ul><ul><ul><li>etc. </li></ul></ul></ul>
  14. 14. Traits on trees phylogenetic signal <ul><li>R Packages: ape, picante, caper, phytools </li></ul><ul><li>Examples from picante and phytools: </li></ul><ul><li># phylogenetic signal </li></ul><ul><li>install.packages(&quot;picante&quot;) </li></ul><ul><li>require(picante) </li></ul><ul><li>randtree <- rcoal(20) </li></ul><ul><li>randtraits <- rTraitCont(randtree) </li></ul><ul><li>Kcalc(randtraits[randtree$tip.label],randtree) </li></ul><ul><li>install.packages(&quot;phytools&quot;) </li></ul><ul><li>require(phytools) </li></ul><ul><li>tree <- rbdtree(1,0,Tmax=4) # make a tree </li></ul><ul><li>x <- fastBM(tree) # simulate traits </li></ul><ul><li>phylosig(tree, x, method=&quot;lambda&quot;, test=TRUE) # calcualte physig, lambda </li></ul><ul><li>phylosig(tree, x, method=&quot;K&quot;, test=TRUE) # calcualte physig, K </li></ul>
  15. 15. Traits on trees modeling trait evolution <ul><li>R Packages: ape, picante, caper, geiger, PHYLOGR, phytools, ade4, motmot </li></ul><ul><li>Above can do: trait evolution of traits, including: discrete and continuous , and with Brownian motion or OU models </li></ul><ul><li>See also: </li></ul><ul><li>Rbrownie </li></ul><ul><li>Various dev evol modeling frameworks to be included in geiger soon: auteur, mecca, medusa, and fossilmedusa </li></ul><ul><li>here: http://www.webpages.uidaho.edu/~lukeh/software/index.html </li></ul>
  16. 16. Ancestral state reconstruction <ul><li>R Packages: ape, ouch, phytools </li></ul><ul><li>Function ‘ace’ in the ape package works nicely </li></ul><ul><li>But very sensitive to parameters </li></ul><ul><li>Example </li></ul><ul><li>data(bird.orders) </li></ul><ul><li>x <- rnorm(23) </li></ul><ul><li>out <- ace(x, bird.orders) </li></ul><ul><li>out$ace will have the ancestral character values (which you’ll have to match to nodes of your tree) </li></ul>
  17. 17. Tree simulations <ul><li>R Packages: Treesim, geiger, ape, phybase </li></ul><ul><li>Example </li></ul><ul><li>require(ape) </li></ul><ul><li>tree <- rcoal(10) # Make a random tree </li></ul><ul><li>trait <- rTraitCont(tree, model = &quot;BM&quot;) # Simulate a trait on that tree </li></ul><ul><li># Write a function to make a tree, simulate a BM trait, and take the mean of that trait </li></ul><ul><li>myfunc <- function(n) { </li></ul><ul><li>tree <- rcoal(n) </li></ul><ul><li>trait <- rTraitCont(tree, model = &quot;BM&quot;) </li></ul><ul><li>mean(trait) </li></ul><ul><li>} </li></ul><ul><li># do it 100 times and make a data.frame required for ggplot2 plotting </li></ul><ul><li>dat <- replicate(100, myfunc(10)) </li></ul><ul><li>dat2 <- data.frame(dat) </li></ul><ul><li># plot results </li></ul><ul><li>require(ggplot2) </li></ul><ul><li>ggplot(dat2, aes(dat)) + geom_histogram() </li></ul>
  18. 18. Get trees <ul><li>rOpenSci’s treeBASE package </li></ul><ul><li>on CRAN: http://cran.r-project.org/web/packages/treebase/ </li></ul><ul><li>install.packages(&quot;treebase&quot;) # install </li></ul><ul><li>require(treebase) # load </li></ul><ul><li>tree <- search_treebase(&quot;Derryberry&quot;, &quot;author&quot;)[[1]] # search </li></ul><ul><li>metadata(tree$S.id) # metadata for tree </li></ul><ul><li>plot(tree) # plot the tree </li></ul>
  19. 19. Phylogenetic community structure <ul><li>R Packages: picante (includes phylocom functionality) </li></ul><ul><li>--Although, not bladj for some reason, talk to me if you want to run bladj from R </li></ul><ul><li>Example </li></ul><ul><li>Fxn ‘comdistnt’ calculates intercommunity mean nearest taxon index </li></ul><ul><li>data(phylocom) </li></ul><ul><li>comdistnt(phylocom$sample, cophenetic(phylocom$phylo), abundance.weighted=FALSE) </li></ul><ul><li>Also, new approach to phycommstruct in R from Matt Helmus, code here: </li></ul><ul><li>http://r-ecology.blogspot.com/2011/10/phylogenetic-community-structure-pglmms.html </li></ul>
  20. 20. Bonus: Polytomy resolver <ul><li>MEE paper: “ A simple polytomy resolver for dated phylogenies” </li></ul><ul><li>by Kuhn, Mooers, and Thomas </li></ul><ul><ul><li>Paper </li></ul></ul><ul><ul><li>http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/abstract </li></ul></ul><ul><ul><li>Supp info has R scripts: http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/suppinfo </li></ul></ul>
  21. 21. Resources <ul><li>Bodega Phylogenetics Wiki: </li></ul><ul><ul><li>Home: http://bodegaphylo.wikispot.org/Front_Page </li></ul></ul><ul><ul><li>BROWNIE tutorial: http://bodegaphylo.wikispot.org/Morphological_Diversification_and_Rates_of_Evolution </li></ul></ul><ul><ul><li>Phylogenetic signal tutorial: http://bodegaphylo.wikispot.org/IV._Testing_Phylogenetic_Signal_in_R </li></ul></ul><ul><li>R phylo-wiki (from NESCent): </li></ul><ul><li>http://www.r-phylo.org/wiki/HowTo/Table_of_Contents </li></ul><ul><li>CRAN task view, Phylogenetics: </li></ul><ul><li>http://cran.r-project.org/web/views/Phylogenetics.html </li></ul><ul><li>rmesquite: https://r-forge.r-project.org/R/?group_id=213 </li></ul><ul><li>R-phylogenetics listserve : </li></ul><ul><ul><li>https://stat.ethz.ch/mailman/options/r-sig-phylo/ </li></ul></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×