Phylogenetics in R

  • 13,435 views
Uploaded on

Talk given on 18 Nov, 2011 on doing phylogenetics in R.

Talk given on 18 Nov, 2011 on doing phylogenetics in R.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
13,435
On Slideshare
0
From Embeds
0
Number of Embeds
19

Actions

Shares
Downloads
223
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Phylogenetics in R Scott Chamberlain November 18, 2011
  • 2. What sorts of phylogenetics things can I do in R?
  • 3. The run down
    • Get sequence data
    • Align sequence data
    • Phylogenetic inference
      • NJ, maxlik, parsimony, Bayesian, UPGMA
    • Visualize phylogenies
    • Traits on trees
      • Phylogenetic signal
      • Trait evolution
      • Ancestral state character reconstruction
    • Tree simulations
    • Get trees
    • Phylogenetic community structure
    • Bonus stuff: polytomy resolver
  • 4. Basic trees in R
    • Example
    • require(ape)
    • tr1 <- read.tree(text = &quot;(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);&quot;)
    • tr1 # print tree summary
    • write.tree(tr1) # print tree in newick format &quot;(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);&quot;
    • tr1$tip.label # tip labels &quot;B&quot; &quot;C&quot; &quot;D&quot; &quot;A&quot;
    • tr1$edge.length # edge labels 0.04 0.01 0.05 0.05 0.06 0.10
    • tr1$node.label # node labels NULL [MEANING – no node labels]
    • # Assign properties to trees
    • tr1$tip.label <- c('sleepy','happy','grumpy','frumpy') # label tips
    • tr1$tip.label # did it work? &quot;sleepy&quot; &quot;happy&quot; &quot;grumpy&quot; &quot;frumpy“
    • Etcetera for other tree properties
  • 5. Get sequence data
    • # install and load ape
    • install.packages(&quot;ape&quot;); require(ape)
    • # get data from Genbank
    • # make vector of accession numbers, for ITS 1 and 2 region for Gossypium (cotton) species
    • cotton_acc <- c(&quot;U56806&quot;, &quot;U12712&quot;, &quot;U56810&quot;,
    • &quot;U12732&quot;, &quot;U12725&quot;, &quot;U56786&quot;, &quot;U12715&quot;,
    • &quot;AF057758&quot;, &quot;U56790&quot;, &quot;U12716&quot;, &quot;U12729&quot;,
    • &quot;U56798&quot;, &quot;U12727&quot;, &quot;U12713&quot;, &quot;U12719&quot;,
    • &quot;U56811&quot;, &quot;U12728&quot;, &quot;U12730&quot;, &quot;U12731&quot;,
    • &quot;U12722&quot;, &quot;U56796&quot;, &quot;U12714&quot;, &quot;U56789&quot;,
    • &quot;U56797&quot;, &quot;U56801&quot;, &quot;U56802&quot;, &quot;U12718&quot;,
    • &quot;U12710&quot;, &quot;U56804&quot;, &quot;U12734&quot;, &quot;U56809&quot;,
    • &quot;U56812&quot;, &quot;AF057753&quot;, &quot;U12711&quot;, &quot;U12717&quot;,
    • &quot;U12723&quot;, &quot;U12726&quot;)
    • # get data from Genbank
    • require(ape)
    • cotton <- read.GenBank(cotton_acc, species.names = T)
    • # name the sequences with species names instead of access numbers
    • names_accs <- data.frame(species = attr(cotton, &quot;species&quot;), accs = names(cotton))
    • names(cotton) <- attr(cotton, &quot;species&quot;)
  • 6. Align sequence data run external: clustal, mafft
    • # multiple sequence alignment
    • ### Get clustalw here, and install: http://www.clustal.org/
    • # set to your working directory
    • setwd(“/path on your computer to/ClustalW2&quot;)
    • # write fasta file to directory
    • write.dna(cotton, &quot;cotton.fas&quot;, format = &quot;fasta&quot;)
    • # run clustal multiple alignment, prints clustal output to console
    • system(paste('&quot;./clustalw2&quot; cotton.fas')) # should work on OSX or Windows
    • # read the alignment back in to R
    • cotton_clustalaligned <- read.dna(&quot;cotton.aln&quot;, format=&quot;clustal&quot;)
     Manual aligment may have to be done, dare I say it, not in R
  • 7. Get and align sequences DIY
    • Get together with a few other people…or not
      • Choose some species to investigate
      • Get their accession numbers on GenBank
      • Download sequence data from Genbank
      • If you are really adventurous, also align sequences
  • 8. Phylogenetic inference Tools
    • R Packages: ape, phangorn, phyclust, phytools, scaleboot
    • ape has the most functionality for phylogenetic inference
    • You should be able to call MrBayes form R, but I don’t know how – package phyloch?
  • 9. Phylogenetic inference
    • Fitting evol models: see fxn modelTest in package phangorn
    • NJ
      • install.packages(“ape&quot;); require(ape)
      • data(woodmouse)
      • trw <- nj(dist.dna(woodmouse))
      • plot(trw)
    • Maximum likelihood
      • install.packages(&quot;phangorn&quot;); require(phangorn)
      • data(Laurasiatherian)
      • dm <- dist.logDet(Laurasiatherian)
      • njtree <- NJ(dm)
      • MLfit <- pml(njtree, Laurasiatherian) # optimize edge length parameter
      • MLfit_ <- optim.pml(MLfit, model = &quot;GTR&quot;)
      • MLfit_$tree
      • plot(MLfit_$tree)
    • Parsimony
      • install.packages(&quot;phangorn&quot;); require(phangorn)
      • data(Laurasiatherian)
      • dm = dist.logDet(Laurasiatherian)
      • tree = NJ(dm)
      • treepars <- optim.parsimony(tree, Laurasiatherian)
  • 10. Phylogenetic inference---Continued
    • Bayesian
      • You can do this (maybe) with the package phyloch (get here: http://www.christophheibl.de/Rpackages.html ), by calling MrBayes from R…
      • … however, MrBayes is giving way to RevBayes here: http://sourceforge.net/projects/revbayes/ ), fyi
  • 11. Phylogenetic inference DIY
    • With your partners…or not
      • Use the sequence data from GenBank you got earlier
      • (if you didn’t align the sequences, don’t worry about it – OR use data set provided with ape or other package)
      • Do some phylogenetic inference a couple of different ways (e.g., NJ and parsimony)
  • 12. Visualize phylogenies
    • R Packages: ape, ade4, phytools, phylobase, ouch, paleoPhylo
    • # visualize phylogenies
    • install.packages(&quot;ape&quot;)
    • require(ape)
    • tree <- rcoal(10)
    • tree
    • plot(tree)
    • plot(tree, type = &quot;cladogram&quot;)
    • plot(tree, type = &quot;unrooted&quot;)
    • plot(tree, type = &quot;radial&quot;)
    • plot(tree, type = &quot;fan&quot;)
  • 13. Visualize phylogenies DIY
    • Get together with a few other people…or not
      • Use the tree you made, or use one provided with ape, or other packages
      • Do basic plotting, e.g.: plot(mytree)
      • Then see if you can
        • color the branches,
        • label the branches with the edge lengths
        • change the tip labels
        • etc.
  • 14. Traits on trees phylogenetic signal
    • R Packages: ape, picante, caper, phytools
    • Examples from picante and phytools:
    • # phylogenetic signal
    • install.packages(&quot;picante&quot;)
    • require(picante)
    • randtree <- rcoal(20)
    • randtraits <- rTraitCont(randtree)
    • Kcalc(randtraits[randtree$tip.label],randtree)
    • install.packages(&quot;phytools&quot;)
    • require(phytools)
    • tree <- rbdtree(1,0,Tmax=4) # make a tree
    • x <- fastBM(tree) # simulate traits
    • phylosig(tree, x, method=&quot;lambda&quot;, test=TRUE) # calcualte physig, lambda
    • phylosig(tree, x, method=&quot;K&quot;, test=TRUE) # calcualte physig, K
  • 15. Traits on trees modeling trait evolution
    • R Packages: ape, picante, caper, geiger, PHYLOGR, phytools, ade4, motmot
    • Above can do: trait evolution of traits, including: discrete and continuous , and with Brownian motion or OU models
    • See also:
    • Rbrownie
    • Various dev evol modeling frameworks to be included in geiger soon: auteur, mecca, medusa, and fossilmedusa
    • here: http://www.webpages.uidaho.edu/~lukeh/software/index.html
  • 16. Ancestral state reconstruction
    • R Packages: ape, ouch, phytools
    • Function ‘ace’ in the ape package works nicely
    • But very sensitive to parameters
    • Example
    • data(bird.orders)
    • x <- rnorm(23)
    • out <- ace(x, bird.orders)
    • out$ace will have the ancestral character values (which you’ll have to match to nodes of your tree)
  • 17. Tree simulations
    • R Packages: Treesim, geiger, ape, phybase
    • Example
    • require(ape)
    • tree <- rcoal(10) # Make a random tree
    • trait <- rTraitCont(tree, model = &quot;BM&quot;) # Simulate a trait on that tree
    • # Write a function to make a tree, simulate a BM trait, and take the mean of that trait
    • myfunc <- function(n) {
    • tree <- rcoal(n)
    • trait <- rTraitCont(tree, model = &quot;BM&quot;)
    • mean(trait)
    • }
    • # do it 100 times and make a data.frame required for ggplot2 plotting
    • dat <- replicate(100, myfunc(10))
    • dat2 <- data.frame(dat)
    • # plot results
    • require(ggplot2)
    • ggplot(dat2, aes(dat)) + geom_histogram()
  • 18. Get trees
    • rOpenSci’s treeBASE package
    • on CRAN: http://cran.r-project.org/web/packages/treebase/
    • install.packages(&quot;treebase&quot;) # install
    • require(treebase) # load
    • tree <- search_treebase(&quot;Derryberry&quot;, &quot;author&quot;)[[1]] # search
    • metadata(tree$S.id) # metadata for tree
    • plot(tree) # plot the tree
  • 19. Phylogenetic community structure
    • R Packages: picante (includes phylocom functionality)
    • --Although, not bladj for some reason, talk to me if you want to run bladj from R
    • Example
    • Fxn ‘comdistnt’ calculates intercommunity mean nearest taxon index
    • data(phylocom)
    • comdistnt(phylocom$sample, cophenetic(phylocom$phylo), abundance.weighted=FALSE)
    • Also, new approach to phycommstruct in R from Matt Helmus, code here:
    • http://r-ecology.blogspot.com/2011/10/phylogenetic-community-structure-pglmms.html
  • 20. Bonus: Polytomy resolver
    • MEE paper: “ A simple polytomy resolver for dated phylogenies”
    • by Kuhn, Mooers, and Thomas
      • Paper
      • http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/abstract
      • Supp info has R scripts: http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/suppinfo
  • 21. Resources
    • Bodega Phylogenetics Wiki:
      • Home: http://bodegaphylo.wikispot.org/Front_Page
      • BROWNIE tutorial: http://bodegaphylo.wikispot.org/Morphological_Diversification_and_Rates_of_Evolution
      • Phylogenetic signal tutorial: http://bodegaphylo.wikispot.org/IV._Testing_Phylogenetic_Signal_in_R
    • R phylo-wiki (from NESCent):
    • http://www.r-phylo.org/wiki/HowTo/Table_of_Contents
    • CRAN task view, Phylogenetics:
    • http://cran.r-project.org/web/views/Phylogenetics.html
    • rmesquite: https://r-forge.r-project.org/R/?group_id=213
    • R-phylogenetics listserve :
      • https://stat.ethz.ch/mailman/options/r-sig-phylo/