phycas lightning talk iEvoBio 2011


Published on

Multi-tree steppingstone sampling for estimating the marginal likelihoods of phylogenetic models introduced. The method is implemented in phycas. The slides briefly sketch out the theory behind the method and introduce a probability distribution over tree topologies that can be "centered" around a tree of interest.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

phycas lightning talk iEvoBio 2011

  1. 1. Estimating marginal likelihoods for phylogenetic modelsin PhycasPhycas is a software package for Bayesian phylogeneticinference (with support for ML searching planned).Paul Lewis is the primary author. Mark Holder and DaveSwofford are co-authors.Written in C++ and Python (using boost-python to createpython bindings to C++ code).Compiled versions and manual: http://www.phycas.orgSource:
  2. 2. Bayesian model selection• Use model averaging if we can “jump” between models, or• Compare their marginal likelihood.The Bayes Factor between two models: Pr(D|M1) B10 = Pr(D|M0) Pr(D|M1) = Pr(D|θ, M1) Pr(θ)dθwhere θ is the set of parameters in the model.
  3. 3. Two simple estimators of the marginal likelihood1. mean of likelihood evaluated at parameter values randomly drawn from the prior.2. harmonic mean of likelihood evaluated at parameter values randomly drawn from the posterior (Newton and Raftery, 1994).
  4. 4. Sharp posterior (black) and prior (red) 40 30density 20 10 0 −2 −1 0 1 2 x
  5. 5. From Dr. Radford Neal’s blogThe Harmonic Mean of the Likelihood: Worst MonteCarlo Method Ever“The total unsuitability of the harmonic meanestimator should have been apparent within an hourof its discovery.”
  6. 6. Steppingstone sampling (Xie et al., 2010; Fan et al., 2010)blends two distributions:• the posterior, Pr(D|θ, M1) Pr(θ, M1)• a tractable reference distribution, π(θ) β (1−β) [Pr(D|θ, M1) Pr(θ, M1)] [π(θ)] pβ (θ|D, M1) = cβ c0 = 1.0 c1 c1 c0.38 c0.1 c0.01 Pr(D|M1) = = c0 c0.38 c0.1 c0.01 c0 c1 c0.38 c0.1 c0.01 = c0.38 c0.1 c0.01 c0
  7. 7. c1 c1 c0.38 c0.1 c0.01 c0 = c0.38 c0.1 c0.01 c0Photo by Johan Nobel downloaded from Wikimedia
  8. 8. Typically, Steppingstone sampling uses a series of slightly vaguerdistributions to estimate the ratio of normalizing constant: Steppingstone densities 40 30 density 20 10 0 −2 −1 0 1 2 x
  9. 9. A reference distribution over tree topologiesWe must be able to:1. calculate the probability for any tree topology,2. center the distribution on the posterior,3. control the “vagueness” of the distribution,4. efficiently sample trees from the distribution.
  10. 10. Tree-Centered Independent-Split-Probability (TCISP)distributionArgument: a tree with probabilities for each split.Result: a probability distribution over all tree topologies.
  11. 11. G J L A 0. 0.5 8 0.6 E H 0.9 0.8 0. D F 4 0.3Input: a focal tree to center the distribution 0.9 C with split probabilities I K
  12. 12. G J L A E H D FWe will keep the blue branchesand avoid the red ones C I K
  13. 13. A G L J HE F D C I K
  14. 14. C A D F E H J LOne of the many resolutions which avoid the red branches G I K
  15. 15. G C A J L D A FE H E H F J L D C G I K I K
  16. 16. Counting trees:Bryant and Steel (2009) provide an O(n5) algorithm forcounting the number of trees that share no splits with anothertree.Multitree steppingstone:• Works on tiny trees (≤ 6 leaves) with no tuning;• We are working on more efficient MCMC for larger trees;• Code on: sampling_ref_dist
  17. 17. Conclusions• Do not trust the harmonic mean estimator of the marginal likelihood.• Take a look at Phycas: (under GPLv2.0; source on GitHub).• Watch for multitree steppingstone is a more generic, usable form soon.• Tree-Centered Independent-Split-Probability (TCISP) distribution may be useful in other contexts: likelihood-based supertrees, or MCMC proposals.
  18. 18. Thanks: NSF AToL and iEvoBioSee: Xie et al. (2010); Fan et al. (2010); Lartillotand Philippe (2006) for more discussion of estimatingmarginal likelihoods.
  19. 19. ReferencesBryant, D. and Steel, M. (2009). Computing the distribution of a tree metric. IEEE IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6(3):420–426.Fan, Y., Wu, R., Chen, M.-H., Kuo, L., and Lewis, P. O. (2010). Choosing among partition models in bayesian phylogenetics. Molecular Biology and Evolution, page (advanced access).Lartillot, N. and Philippe, H. (2006). Computing Bayes factors using thermodynamic integration. Systematic Biology, 55(2):195–207.Newton, M. A. and Raftery, A. E. (1994). Approximate bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society, Series B (Methodological), 56(1):3–48.Xie, W., Lewis, P. O., Fan, Y., Kuo, L., and Chen, M.-H. (2010). Improving
  20. 20. marginal likelihood estimation for Bayesian phylogenetic model selection.Systematic Biology, 60(2):150–160.