Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Bayesian Phylogenetics - Systematics.pptx
1. A biologist’s guide to
Bayesian phylogenetic
analysis
Nascimento, et al. 2017
By Ryan Long
Systematics
“Nascimento’s BEST
publication since 2011”
2. Introduction
Bayesian phylogenetic models were
introduced in the 1990s, and have
revolutionized the way we sequence data
popularity due to (1) the development of
powerful models of data analysis and (2) the
availability of user-friendly computer
programs to apply the models
But models are becoming increasingly
complicated and the programs themselves
not always clear to the users
3. Why does this paper exist?
Purpose: "this paper aims to explain the
basic concepts of Bayesian statistics and
discuss the major features of MCMC
algorithms, such as the prior and the
likelihood, MCMC proposals, diagnosis of
MCMC convergence and mixing, and the
summary of the posterior sample.“
Audience: the empirical biologist who
needs to use Bayesian phylogenetic
programs to analyze their data
4. What is the Bayesian method?
it is a statistical inference methodology and is
the basis for a number of programs used to
produce phylogenetic estimations
Uses Bayes' Theorem, which describes the
probability of an event based on prior
knowledge of conditions that might be related
to the event
Probability distribution: the mathematical
function that gives all the possible values and
likelihoods that a variable can take in a given
range
Prior (probability distribution): the probability
distribution that would express one's beliefs
about this quantity before some evidence is
taken into account.
5. Bayesian method cont.
Posterior distribution: the probability that the model is correct – basically
what you think post-analysis
Summarizes all our information about what’s being examined
As we get more correct data the posterior distribution should become
more sharply peaked about a single value.
Ex: Previous data suggests 20% of systematics students explode by the end of
the semester (prior). You think it’s higher, you collect data, and the new results
are the posterior (50% of students will explode)
Prior belief
Data
Posterior belief
6. What type of data can be used?
most common type of data in phylogenetic
analyses:
DNA and amino acid sequence
alignments
Morphological characters
Must be properly aligned before input
Multiple sequence alignment pictured below:
7. How to choose a prior for Bayesian analysis
There is a certain level of subjectivity to prior selection,
and improper selection can affect results
The prior distribution can strongly affect the posterior
distribution
Default priors in software packages may not be
appropriate for one’s data
It is also necessary to specify a prior on the evolutionary
rates for the different loci or partitions
8. What is
Markov chain Monte Carlo (MCMC): a class of algorithms for
sampling from a probability distribution
They make it possible to compute large hierarchical models that
require integrations exceeding thousands of unknown
parameters
Need: data, a prior, and a model – then you use MCMC to take a
sample from the posterior distribution
The algorithm generates that sample, which can then be used to
estimate the mean, the standard deviation of the posterior or
even the whole posterior distribution
9. What are convergence, burn-in and mixing of
the MCMC?
Convergence rate: The MCMC should be visiting regions of high probability
within the posterior distribution. The convergence rate is the rate at which the
MCMC will move to these high-probability regions from any initial starting point
Burn-in: The initial part of the MCMC chain where it is approaching the sampling
distribution from its starting point. This is usually discarded. (Before it reaches
the high-probability zones)
Slow convergence rate = long burn-in
Mixing efficiency: efficiency at which the MCMC samples parameters. If an
MCMC chain is mixing well, it implies that autocorrelation in the chain is low, ESS
is high and the estimates obtained are accurate. A chain that is mixing well will
have parameter traces that look like straight hairy caterpillars, with the chain
fluctuating so rapidly around the equilibrium that their are no obvious trends
10. Conclusion
Bayesian phylogenetics has undergone
explosive growth in the past decade due to
easy-to-use models and available programs
Bayesian MCMC methods are most commonly
used “in the areas of divergence time
estimation integrating molecular,
morphological and fossil information, species
tree estimation using multi-loci genomic
sequence data, and species delimitation
incorporating genetic and
morphological/ecological information”