phycas lightning talk iEvoBio 2011
Upcoming SlideShare
Loading in...5
×
 

phycas lightning talk iEvoBio 2011

on

  • 424 views

Multi-tree steppingstone sampling for estimating the marginal likelihoods of phylogenetic models introduced. The method is implemented in phycas. The slides briefly sketch out the theory behind the ...

Multi-tree steppingstone sampling for estimating the marginal likelihoods of phylogenetic models introduced. The method is implemented in phycas. The slides briefly sketch out the theory behind the method and introduce a probability distribution over tree topologies that can be "centered" around a tree of interest.

Statistics

Views

Total Views
424
Views on SlideShare
424
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

phycas lightning talk iEvoBio 2011 Presentation Transcript

  • 1. Estimating marginal likelihoods for phylogenetic modelsin PhycasPhycas is a software package for Bayesian phylogeneticinference (with support for ML searching planned).Paul Lewis is the primary author. Mark Holder and DaveSwofford are co-authors.Written in C++ and Python (using boost-python to createpython bindings to C++ code).Compiled versions and manual: http://www.phycas.orgSource: https://github.com/mtholder/Phycas
  • 2. Bayesian model selection• Use model averaging if we can “jump” between models, or• Compare their marginal likelihood.The Bayes Factor between two models: Pr(D|M1) B10 = Pr(D|M0) Pr(D|M1) = Pr(D|θ, M1) Pr(θ)dθwhere θ is the set of parameters in the model.
  • 3. Two simple estimators of the marginal likelihood1. mean of likelihood evaluated at parameter values randomly drawn from the prior.2. harmonic mean of likelihood evaluated at parameter values randomly drawn from the posterior (Newton and Raftery, 1994).
  • 4. Sharp posterior (black) and prior (red) 40 30density 20 10 0 −2 −1 0 1 2 x
  • 5. From Dr. Radford Neal’s blogThe Harmonic Mean of the Likelihood: Worst MonteCarlo Method Ever“The total unsuitability of the harmonic meanestimator should have been apparent within an hourof its discovery.”
  • 6. Steppingstone sampling (Xie et al., 2010; Fan et al., 2010)blends two distributions:• the posterior, Pr(D|θ, M1) Pr(θ, M1)• a tractable reference distribution, π(θ) β (1−β) [Pr(D|θ, M1) Pr(θ, M1)] [π(θ)] pβ (θ|D, M1) = cβ c0 = 1.0 c1 c1 c0.38 c0.1 c0.01 Pr(D|M1) = = c0 c0.38 c0.1 c0.01 c0 c1 c0.38   c0.1   c0.01   = c0.38   c0.1   c0.01   c0
  • 7. c1 c1 c0.38 c0.1 c0.01 c0 = c0.38 c0.1 c0.01 c0Photo by Johan Nobel http://www.flickr.com/photos/43147325@N08/4326713557/ downloaded from Wikimedia
  • 8. Typically, Steppingstone sampling uses a series of slightly vaguerdistributions to estimate the ratio of normalizing constant: Steppingstone densities 40 30 density 20 10 0 −2 −1 0 1 2 x
  • 9. A reference distribution over tree topologiesWe must be able to:1. calculate the probability for any tree topology,2. center the distribution on the posterior,3. control the “vagueness” of the distribution,4. efficiently sample trees from the distribution.
  • 10. Tree-Centered Independent-Split-Probability (TCISP)distributionArgument: a tree with probabilities for each split.Result: a probability distribution over all tree topologies.
  • 11. G J L A 0. 0.5 8 0.6 E H 0.9 0.8 0. D F 4 0.3Input: a focal tree to center the distribution 0.9 C with split probabilities I K
  • 12. G J L A E H D FWe will keep the blue branchesand avoid the red ones C I K
  • 13. A G L J HE F D C I K
  • 14. C A D F E H J LOne of the many resolutions which avoid the red branches G I K
  • 15. G C A J L D A FE H E H F J L D C G I K I K
  • 16. Counting trees:Bryant and Steel (2009) provide an O(n5) algorithm forcounting the number of trees that share no splits with anothertree.Multitree steppingstone:• Works on tiny trees (≤ 6 leaves) with no tuning;• We are working on more efficient MCMC for larger trees;• Code on: https://github.com/mtholder/Phycas/tree/ sampling_ref_dist
  • 17. Conclusions• Do not trust the harmonic mean estimator of the marginal likelihood.• Take a look at Phycas: http://www.phycas.org (under GPLv2.0; source on GitHub).• Watch for multitree steppingstone is a more generic, usable form soon.• Tree-Centered Independent-Split-Probability (TCISP) distribution may be useful in other contexts: likelihood-based supertrees, or MCMC proposals.
  • 18. Thanks: NSF AToL and iEvoBioSee: Xie et al. (2010); Fan et al. (2010); Lartillotand Philippe (2006) for more discussion of estimatingmarginal likelihoods.
  • 19. ReferencesBryant, D. and Steel, M. (2009). Computing the distribution of a tree metric. IEEE IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6(3):420–426.Fan, Y., Wu, R., Chen, M.-H., Kuo, L., and Lewis, P. O. (2010). Choosing among partition models in bayesian phylogenetics. Molecular Biology and Evolution, page (advanced access).Lartillot, N. and Philippe, H. (2006). Computing Bayes factors using thermodynamic integration. Systematic Biology, 55(2):195–207.Newton, M. A. and Raftery, A. E. (1994). Approximate bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society, Series B (Methodological), 56(1):3–48.Xie, W., Lewis, P. O., Fan, Y., Kuo, L., and Chen, M.-H. (2010). Improving
  • 20. marginal likelihood estimation for Bayesian phylogenetic model selection.Systematic Biology, 60(2):150–160.