Upcoming SlideShare
×

# phycas lightning talk iEvoBio 2011

512 views

Published on

Multi-tree steppingstone sampling for estimating the marginal likelihoods of phylogenetic models introduced. The method is implemented in phycas. The slides briefly sketch out the theory behind the method and introduce a probability distribution over tree topologies that can be "centered" around a tree of interest.

Published in: Technology, Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
512
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
4
0
Likes
0
Embeds 0
No embeds

No notes for slide

### phycas lightning talk iEvoBio 2011

1. 1. Estimating marginal likelihoods for phylogenetic modelsin PhycasPhycas is a software package for Bayesian phylogeneticinference (with support for ML searching planned).Paul Lewis is the primary author. Mark Holder and DaveSwoﬀord are co-authors.Written in C++ and Python (using boost-python to createpython bindings to C++ code).Compiled versions and manual: http://www.phycas.orgSource: https://github.com/mtholder/Phycas
2. 2. Bayesian model selection• Use model averaging if we can “jump” between models, or• Compare their marginal likelihood.The Bayes Factor between two models: Pr(D|M1) B10 = Pr(D|M0) Pr(D|M1) = Pr(D|θ, M1) Pr(θ)dθwhere θ is the set of parameters in the model.
3. 3. Two simple estimators of the marginal likelihood1. mean of likelihood evaluated at parameter values randomly drawn from the prior.2. harmonic mean of likelihood evaluated at parameter values randomly drawn from the posterior (Newton and Raftery, 1994).
4. 4. Sharp posterior (black) and prior (red) 40 30density 20 10 0 −2 −1 0 1 2 x
5. 5. From Dr. Radford Neal’s blogThe Harmonic Mean of the Likelihood: Worst MonteCarlo Method Ever“The total unsuitability of the harmonic meanestimator should have been apparent within an hourof its discovery.”
6. 6. Steppingstone sampling (Xie et al., 2010; Fan et al., 2010)blends two distributions:• the posterior, Pr(D|θ, M1) Pr(θ, M1)• a tractable reference distribution, π(θ) β (1−β) [Pr(D|θ, M1) Pr(θ, M1)] [π(θ)] pβ (θ|D, M1) = cβ c0 = 1.0 c1 c1 c0.38 c0.1 c0.01 Pr(D|M1) = = c0 c0.38 c0.1 c0.01 c0 c1 c0.38 c0.1 c0.01 = c0.38 c0.1 c0.01 c0
7. 7. c1 c1 c0.38 c0.1 c0.01 c0 = c0.38 c0.1 c0.01 c0Photo by Johan Nobel http://www.flickr.com/photos/43147325@N08/4326713557/ downloaded from Wikimedia
8. 8. Typically, Steppingstone sampling uses a series of slightly vaguerdistributions to estimate the ratio of normalizing constant: Steppingstone densities 40 30 density 20 10 0 −2 −1 0 1 2 x
9. 9. A reference distribution over tree topologiesWe must be able to:1. calculate the probability for any tree topology,2. center the distribution on the posterior,3. control the “vagueness” of the distribution,4. eﬃciently sample trees from the distribution.
10. 10. Tree-Centered Independent-Split-Probability (TCISP)distributionArgument: a tree with probabilities for each split.Result: a probability distribution over all tree topologies.
11. 11. G J L A 0. 0.5 8 0.6 E H 0.9 0.8 0. D F 4 0.3Input: a focal tree to center the distribution 0.9 C with split probabilities I K
12. 12. G J L A E H D FWe will keep the blue branchesand avoid the red ones C I K
13. 13. A G L J HE F D C I K
14. 14. C A D F E H J LOne of the many resolutions which avoid the red branches G I K
15. 15. G C A J L D A FE H E H F J L D C G I K I K
16. 16. Counting trees:Bryant and Steel (2009) provide an O(n5) algorithm forcounting the number of trees that share no splits with anothertree.Multitree steppingstone:• Works on tiny trees (≤ 6 leaves) with no tuning;• We are working on more eﬃcient MCMC for larger trees;• Code on: https://github.com/mtholder/Phycas/tree/ sampling_ref_dist
17. 17. Conclusions• Do not trust the harmonic mean estimator of the marginal likelihood.• Take a look at Phycas: http://www.phycas.org (under GPLv2.0; source on GitHub).• Watch for multitree steppingstone is a more generic, usable form soon.• Tree-Centered Independent-Split-Probability (TCISP) distribution may be useful in other contexts: likelihood-based supertrees, or MCMC proposals.
18. 18. Thanks: NSF AToL and iEvoBioSee: Xie et al. (2010); Fan et al. (2010); Lartillotand Philippe (2006) for more discussion of estimatingmarginal likelihoods.
19. 19. ReferencesBryant, D. and Steel, M. (2009). Computing the distribution of a tree metric. IEEE IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6(3):420–426.Fan, Y., Wu, R., Chen, M.-H., Kuo, L., and Lewis, P. O. (2010). Choosing among partition models in bayesian phylogenetics. Molecular Biology and Evolution, page (advanced access).Lartillot, N. and Philippe, H. (2006). Computing Bayes factors using thermodynamic integration. Systematic Biology, 55(2):195–207.Newton, M. A. and Raftery, A. E. (1994). Approximate bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society, Series B (Methodological), 56(1):3–48.Xie, W., Lewis, P. O., Fan, Y., Kuo, L., and Chen, M.-H. (2010). Improving
20. 20. marginal likelihood estimation for Bayesian phylogenetic model selection.Systematic Biology, 60(2):150–160.