SlideShare a Scribd company logo
1 of 90
Bayesian Phylogenetic
Inference: Introduction
Fred(rik) Ronquist
Swedish Museum of Natural History,
Stockholm, Sweden
BIG 4 Workshop, October 9-19, Tovetorp, Sweden
Topics
 Probability 101
 Bayesian Phylogenetic Inference
 Markov chain Monte Carlo
 Bayesian Model Choice and Model
Averaging
1. Probability 101
Do not worry about your
difficulties in Mathematics. I
can assure you that mine
are still greater.
Albert Einstein
α Uniform distribution
β
γ
δ
ε
Beta distribution
Gamma distribution
Dirichlet distribution
Exponential distribution
Continuous probability distributions
DiscreteProbability
Sample spaceω1 ω2 ω3 ω4 ω5 ω6
E = ω1,ω3,ω5{ } Event (subset of outcomes;
e.g., face with odd number)
Distribution function
(E.g., uniform)
m(ω)
Pr(E) = m(ω)
ω ∈Ε
∑ Probability
Random variable X
ContinuousProbability
Sample space
(an interval)
Disc with
circumference 1
f (x) Probability density
function (pdf)
(e.g. Uniform(0,1))
Pr(E) = f (x)
x∈E
∫ dx Probability
E = a,b[ ) Event (a subspace of the
sample space)
a b
Random variable X
ProbabilityDistributions
• Uniform distribution
• Beta distribution
• Gamma distribution
• Dirichlet distribution
• Exponential distribution
• Normal distribution
• Lognormal distribution
• Multivariate normal distribution
Continuous Distributions
ProbabilityDistributions
• Bernoulli distribution
• Categorical distribution
• Binomial distribution
• Multinomial distribution
• Poisson distribution
Discrete Distributions
StochasticProcesses
• Markov chain
• Poisson process
• Birth-death process
• Coalescence
• Dirichlet Process Mixture
Stochastic Processes
ExponentialDistribution
f (x) = λe−λx
Mean: 1/λ
λ = rate (of decay)
Exp(λ)X ~
Parameters:
Probability density function:
Exponential distribution
GammaDistribution
f (x) µ xα−1
e−βx
Mean: α /β
α = shape
Gamma(α,β)X ~
Parameters:
Probability density function:
Gamma distribution
β = inverse scale
Scaled gamma: α = β
Scaled Gamma
BetaDistribution
f (x) µ xα1 −1
(1− x)α2 −1
Mode:
α1 −1
αi −1( )
i
∑
α1,α2 = shape parameters
Beta(α1,α2)X ~
Parameters:
Probability density function:
Beta distribution
Defined on two proportions of a whole
(a simplex)
DirichletDistribution
f (x) µ xi
αi −1
i
∏
α = vector of k shape parameters
Dir(α) :α = α1,α2,...,αk{ }X ~
Parameters:
Probability density function:
Dirichlet distribution
Defined on k proportions of a whole
Dir(1,1,1,1)
Dir(300,300,300,300)
ConditionalProbability
Pr(H) =
4
10
= 0.4
Pr(D) =
3
10
= 0.3
Pr(D,H) =
2
10
= 0.2Joint probability:
Pr(D | H) =
2
4
= 0.5Conditional probability:
Bayes’Rule
Reverend Thomas Bayes
(1701-1760)
Pr(A | B) ⇒ Pr(B | A) ?
Bayes’Rule
Pr(D,H) = Pr(D)Pr(H | D)
= Pr(H)Pr(D | H)
Pr(D)Pr(H | D) = Pr(H)Pr(D | H)
Bayes’ rulePr(H | D) =
Pr(H)Pr(D | H)
Pr(D)
=
3
10
×
2
3
=
2
10
= 0.2
=
4
10
×
2
4
=
2
10
= 0.2
MaximumLikelihood
Pr(D |θ)
Maximum Likelihood Inference
Data D; Model M with parameters θ
We can calculate or f (D |θ)
Maximum likelihood: find the value of θ that
maximizes L(θ)
Define the likelihood function L(θ)µ f (D |θ)
Confidence: asymptotic behavior, more
samples, bootstrapping
BayesianInference
Pr(D |θ)
Bayesian Inference
Data D; Model M with parameters θ
We can calculate or f (D |θ)
We are actually interested in Pr(θ | D) or f (θ | D)
Bayes’ rule:
f (θ | D) =
f (θ) f (D |θ)
f (D)
Posterior density
Prior density
”Likelihood”
Normalizing constant
Marginal likelihood of the data
Model likelihood
f (D) = f (θ)∫ f (D |θ) dθ
Coin Tossing Example
What is the probability of
your favorite team winning
the next ice hockey World
Championships?
Gold Silver Bronze
2012
2013
2014
World Championship Medalists
other
2
5
8
1
5
5
3
1
Medals
2011
2007
2008
2009
2010
2015
2016
other
2
5
8
1
5
5
3
1
Prior
( )f θ
other
2
0
8
0
0
5
3
0
Posterior 1
1( | )f Dθ
other
0
0
0
0
0
5
0
0
Posterior 2
2( | )f Dθ
other
in
out
in
out
out
in
in
out
Data 1
other
out
out
out
out
out
won
out
out
Data 2
( )f θ 1 2( | )f D Dθ +
Learn more:
• Wikipedia (good texts on most statistical distributions,
sometimes a little difficult)
• Grinstead & Snell: Introduction to Probability.
American Mathematical Society. Free pdf available
from:
http://www.dartmouth.edu/~chance/teaching_aids/books_art
• Team up with a statistician or a computational /
theoretical evolutionary biologist!
2. Bayesian Phylogenetic
Inference
Infer relationships among three species:
Outgroup:
Three possible trees (topologies):
A
B
C
A B C
Prior distribution
probability
1.0
Posterior distribution
probability
1.0
Data (observations)
Model
D The data
Taxon Characters
A ACG TTA TTA AAT TGT CCT CTT TTC AGA
B ACG TGT TTC GAT CGT CCT CTT TTC AGA
C ACG TGT TTA GAC CGA CCT CGG TTA AGG
D ACA GGA TTA GAT CGT CCG CTT TTC AGA
Model: topology AND branch lengths
θ Parameters
topology )(τ
branch lengths )( iv
A
B
3v
C
D
2v
1v
4v
5v
(expected amount of change)
),( vτθ =
Model: molecular evolution
θ Parameters
instantaneous rate matrix
(Jukes-Cantor)
Q =
[A] [C] [G] [T]
[A] − µ µ µ
[C] µ − µ µ
[G] µ µ − µ
[T] µ µ µ −









÷
÷
÷
÷
÷÷
Model: molecular evolution
Probabilities are calculated using the transition
probability matrix P
4 /3
4 /3
1 1
4 4( )
1 3
4 4
v
Qv
v
e
P v e
e
−
−

−
= = 
 +

(change)
(no change)
Priors on parameters
 Topology
 all unique topologies have equal
probability
 Branch lengths
 exponential prior (puts more weight on
small branch lengths)
4 /3v
p k ke−
= −v
Exp(1)v :
Exp(1)v :
The effect on data likelihood is most important
Jeffrey’s uninformative priors formalize this
Branch length Prob. of substitution
Scale matters in priors
Bayes’ theorem
( ) ( | )
( | )
( ) ( | ) d
f f D
f D
f f D
θ θ
θ
θ θ θ
=
∫
Posterior
distribution
Prior distribution ”Likelihood”
Normalizing constant
D = Data
θ = Model parameters
tree 1 tree 2 tree 3
θ
)|( Xf θ
Posterior probability distribution
Parameter space
Posteriorprobability
tree 1 tree 2 tree 3
20% 48% 32%
We can focus on any parameter of interest
(there are no nuisance parameters) by
marginalizing the posterior over the other
parameters (integrating out the
uncertainty in the other parameters)
(Percentages denote marginal probability distribution on trees)
32.048.020.0
38.014.019.005.0
33.006.022.005.0
29.012.007.010.0
3
2
1
321
ν
ν
ν
τττ
joint probabilities
marginal probabilities
Why is it called marginalizing?
trees
branchlengthvectors
tree 1 tree 2 tree 3
always accept
accept sometimes
 Start at an arbitrary point
 Make a small random move
 Calculate height ratio (r) of new state to old state:
 r > 1 -> new state accepted
 r < 1 -> new state accepted with probability r. If new
state not accepted, stay in the old state
 Go to step 2
Markov chain Monte Carlo
The proportion of time the
MCMC procedure samples
from a particular parameter
region is an estimate of that
region’s posterior
probability density
1
2b
2a
20 % 48 % 32 %
Metropolis algorithm
Assume that the current state has
parameter values θ
Consider a move to a state with parameter
values θ *
The height ratio r is
(prior ratio x likelihood ratio)
r =
f (θ*
| D)
f (θ | D)
=
f (θ*
) f (D |θ*
)/ f (D)
f (θ) f (D |θ)/ f (D)
=
f (θ*
)
f (θ)
×
f (D |θ*
)
f (D |θ)
MCMC Sampling Strategies
 Great freedom of strategies:
 Typically one or a few related parameters
changed at a time
 You can cycle through parameters
systematically or choose randomly
 One ”generation” or ”iteration” or ”cycle” can
include a single randomly chosen proposal (or
move, operator, kernel), one proposal for each
parameter, a block of randomly chosen
proposals
Trace Plot
burn-in
stationary phase sampled with thinning
(rapid mixing essential)
Majority rule
consensus tree
Frequencies
represent the
posterior
probability of
the clades
Probability of
clade being true
given data and
model
Summarizing Trees
 Maximum posterior probability tree (MAP tree)
 can be difficult to estimate precisely
 can have low probability
 Majority rule consensus tree
 easier to estimate clade probabilities exactly
 branch length distributions can be summarized across all
trees with the branch
 can hide complex topological dependence
 branch length distributions can be multimodal
 Credible sets of trees
 Include trees in order of decreasing probability to
obtain, e.g., 95 % credible set
 “Median” or “central” tree
Adding Model Complexity
A
B
topology General Time Reversible
substitution model
)(τ
÷
÷
÷
÷
÷









−
−
−
−
=
GTGCTCATA
GTTCGCAGA
CTTCGGACA
ATTAGGACC
rrr
rrr
rrr
rrr
Q
πππ
πππ
πππ
πππ
3v
C
D
2v
1v
4v
5v
branch lengths )( iv
Adding Model Complexity
Gamma-shaped
rate variation
across sites
Priors on Parameters
 Stationary state frequencies
 Flat Dirichlet, Dir(1,1,1,1)
 Exchangeability parameters
 Flat Dirichlet, Dir(1,1,1,1,1,1)
 Shape parameter of scaled gamma
distribution of rate variation across
sites
 Uniform Uni(0,50)
Mean and 95%
credibility
interval for model
parameters
Summarizing Variables
 Mean, median, variance common
summaries
 95 % credible interval: discard the
lowest 2.5 % and highest 2.5 % of
sampled values
 95 % region of highest posterior
density (HPD): find smallest region
containing 95 % of probability
Credible interval
HPD
HPD HPD
Credible intervals and HPDs
Other Sampling Methods
 Gibbs sampling: sample from the conditional posterior (a variant of
the Metropolis algorithm)
 Metropolized Gibbs sampling: more efficient variant of Gibbs
sampling of discrete characters
 Slice sampling: less prone to get stuck in local optima than the
Metropolis algorithm
 Hamiltonian sampling. A technique for decreasing the problem with
sampling correlated parameters.
 Simulated annealing: increase ”greediness” during the burn-in
phase of MCMC sampling
 Data augmentation techniques: add parameters to facilitate
probability calculations
 Sequential Monte Carlo techniques: generate a sample of complete
state by building sets of particles from incomplete states
Sequential Monte Carlo Algorithm for Phylogenetics
Bouchard et al. 2012. Syst. Biol.
3. Markov chain Monte
Carlo
Convergence and Mixing
 Convergence is the degree to which
the chain has converged onto the
target distribution
 Mixing is the speed with which the
chain covers the region of interest in
the target distribution
Assessing Convergence
 Plateau in the trace plot
 Look at sampling behavior within the
run (autocorrelation time, effective
sample size)
 Compare independent runs with
different, randomly chosen starting
points
Convergence within Run
t =1+ 2 ρk (θ)
k=1
∞
∑Autocorrelation time
where ρk(θ) is the autocorrelation in the MCMC
samples for a lag of k generations
Effective sample size (ESS) e =
n
t
where n is the total sample size (number of
generations)
Good mixing when t is small and e large
Convergence among Runs
 Tree topology:
 Compare clade probabilities (split frequencies)
 Average standard deviation of split frequencies
above some cut-off (min. 10 % in at least one
run). Should go to 0 as runs converge.
 Continuous variables
 Potential scale reduction factor (PSRF).
Compares variance within and between runs.
Should approach 1 as runs converge.
 Assumes overdispersed starting points
Cladeprobabilityinanalysis2
Clade probability in analysis 1
Cladeprobabilityinanalysis2
Clade probability in analysis 1
Sampledvalue Target distribution
Too modest proposals
Acceptance rate too high
Poor mixing
Too bold proposals
Acceptance rate too low
Poor mixing
Moderately bold proposals
Acceptance rate intermediate
Good mixing
Tuning Proposals
 Manually by changing tuning parameters
 Increase the boldness of a proposal if
acceptance rate is too high
 Decrease the boldness of a proposal if
acceptance rate is too low
 Auto-tuning
 Tuning parameters are adjusted automatically
by the MCMC procedure to reach a target
proposal rate
cold chain
heated chain
Metropolis-
coupled
Markov chain
Monte Carlo
a. k. a.
MCMCMC
a. k. a.
(MC)3
cold chain
hot chain
cold chain
hot chain
cold chain
hot chain
unsuccessful swap
cold chain
hot chain
cold chain
hot chain
cold chain
hot chain
cold chain
hot chain
successful swap
cold chain
hot chain
cold chain
hot chain
cold chain
hot chain
successful swap
cold chain
hot chain
Incremental Heating
( )iT λ+= 1/1 { }1,...,1,0 −= ni
( )
( )
( )
( ) 62.0
71.0
83.0
00.1
|62.03
|71.02
|83.01
|00.10
Distr.
Xf
Xf
Xf
Xf
Ti
θ
θ
θ
θ
T is temperature, λ is heating coefficient
Example for λ = 0.2:
cold chain
heated chains
4. Bayesian Model Choice
Bayesian Model Sensitivity
Models, models, models
 Alignment-free models
 Heterogeneity in substitution rates and stationary
frequencies across sites and lineages
 Relaxed clock models
 Models for morphology and biogeography
 Sampling across model space, e.g. GTR space and
partition space
 Models of dependence across sites according to
3D structure of proteins
 Positive selection models
 Aminoacid models
 Models for population genetics and phylogeography
Bayes’ Rule
f (θ | D) =
f (θ) f (D |θ)
f (θ) f (D |θ) dθ∫
=
f (θ) f (D |θ)
f (D)
Marginal likelihood (of the data)
f (θ | D,M) =
f (θ | M) f (D |θ,M)
f (D | M)
We have implicitly conditioned on a model:
Bayesian Model Choice
f M1( ) f D | M1( )
f M0( ) f D | M0( )
Posterior model odds:
B10 =
f D | M1( )
f D | M0( )
Bayes factor:
Bayesian Model Choice
 The normalizing constant in Bayes’ theorem, the
marginal likelihood of the data, f(D) or f(D|M), can
be used for model choice
 f(D|M) can be estimated by taking the harmonic
mean of the likelihood values from the MCMC run.
Thermodynamic integration and stepping-stone
sampling are computationally more complex but
more accurate methods
 Any models can be compared: nested, non-nested,
data-derived; it is just a probability comparison
 No correction for number of parameters
 Can prefer a simpler model over a more complex
model
 Critical values in Kass and Raftery (1997)
Simple Model Wins (from Lewis, 2008)
Bayes Factor Comparisons
Interpretation of the Bayes factor
2ln(B10) B10 Evidence against M0
0 to 2 1 to 3
Not worth more than a
bare mention
2 to 6 3 to 20 Positive
6 to 10 20 to 150 Strong
> 10 > 150 Very strong
Bayesian Software
 Model testing
 ModelTest
 MrModelTest
 MrAIC
 Convergence diagnostics
 AWTY
 Tracer
 Phylogenetic inference
 MrBayes
 BEAST
 BayesPhylogenies
 PhyloBayes
 Phycas
 BAMBE
 RevBayes
 Specialized inference
 PHASE
 BAliPhy
 BayesTraits
 Badger
 BEST
 *BEAST
 CoEvol
 Tree drawing
 TreeView
 FigTree
Listening to lectures,
after a certain age,
diverts the mind too much
from its creative pursuits.
Any scientist who attends
too many lectures and
uses her own brain too
little falls into lazy habits
of thinking.
after Albert Einstein

More Related Content

What's hot

Lecture 3 quantitative traits and heritability full
Lecture 3 quantitative traits and heritability fullLecture 3 quantitative traits and heritability full
Lecture 3 quantitative traits and heritability fullLekki Frazier-Wood
 
Outbreeding
OutbreedingOutbreeding
OutbreedingJheiiyn
 
Mitigation of enteric methane emissions from ruminant animals
Mitigation of enteric methane emissions from ruminant animalsMitigation of enteric methane emissions from ruminant animals
Mitigation of enteric methane emissions from ruminant animalsFAO
 
MATING SYSTEMS
MATING SYSTEMSMATING SYSTEMS
MATING SYSTEMSmexiyardie
 
Population Genetics 2015 03-20 (AGB 32012)
Population Genetics 2015 03-20 (AGB 32012)Population Genetics 2015 03-20 (AGB 32012)
Population Genetics 2015 03-20 (AGB 32012)Suvanthinis
 
Open Nucleus Breeding System (ONBS)
Open Nucleus Breeding System (ONBS)Open Nucleus Breeding System (ONBS)
Open Nucleus Breeding System (ONBS)Md Sadakatul Bari
 
effective population size and its applications in plant breeding
effective population size and its applications in plant breeding effective population size and its applications in plant breeding
effective population size and its applications in plant breeding Anilkumar C
 
Selection, type of selection, patterns of selection and their effect on popul...
Selection, type of selection, patterns of selection and their effect on popul...Selection, type of selection, patterns of selection and their effect on popul...
Selection, type of selection, patterns of selection and their effect on popul...Kalpesh Damor
 
Sire evaluation
Sire evaluationSire evaluation
Sire evaluationSyedShaanz
 
Hardy-Weinberg Law
Hardy-Weinberg LawHardy-Weinberg Law
Hardy-Weinberg LawSyedShaanz
 
Gene pool and h w law
Gene pool and h w lawGene pool and h w law
Gene pool and h w lawAhmed Sarwar
 
Sheep and goat breeds of pakistan
Sheep and goat breeds of pakistanSheep and goat breeds of pakistan
Sheep and goat breeds of pakistanUsman Khalid
 
1 gpb 621 quantitative genetics introduction
1 gpb 621 quantitative genetics   introduction1 gpb 621 quantitative genetics   introduction
1 gpb 621 quantitative genetics introductionSaravananK153
 
Population Genetics
Population GeneticsPopulation Genetics
Population GeneticsJolie Yu
 

What's hot (20)

Heritability
HeritabilityHeritability
Heritability
 
Lecture 3 quantitative traits and heritability full
Lecture 3 quantitative traits and heritability fullLecture 3 quantitative traits and heritability full
Lecture 3 quantitative traits and heritability full
 
Outbreeding
OutbreedingOutbreeding
Outbreeding
 
Mitigation of enteric methane emissions from ruminant animals
Mitigation of enteric methane emissions from ruminant animalsMitigation of enteric methane emissions from ruminant animals
Mitigation of enteric methane emissions from ruminant animals
 
Progeny testing
Progeny testing Progeny testing
Progeny testing
 
MATING SYSTEMS
MATING SYSTEMSMATING SYSTEMS
MATING SYSTEMS
 
Variation
VariationVariation
Variation
 
Lec13 Mating Systems
Lec13 Mating SystemsLec13 Mating Systems
Lec13 Mating Systems
 
Population Genetics 2015 03-20 (AGB 32012)
Population Genetics 2015 03-20 (AGB 32012)Population Genetics 2015 03-20 (AGB 32012)
Population Genetics 2015 03-20 (AGB 32012)
 
Open Nucleus Breeding System (ONBS)
Open Nucleus Breeding System (ONBS)Open Nucleus Breeding System (ONBS)
Open Nucleus Breeding System (ONBS)
 
effective population size and its applications in plant breeding
effective population size and its applications in plant breeding effective population size and its applications in plant breeding
effective population size and its applications in plant breeding
 
Selection, type of selection, patterns of selection and their effect on popul...
Selection, type of selection, patterns of selection and their effect on popul...Selection, type of selection, patterns of selection and their effect on popul...
Selection, type of selection, patterns of selection and their effect on popul...
 
Repeatability
RepeatabilityRepeatability
Repeatability
 
Sire evaluation
Sire evaluationSire evaluation
Sire evaluation
 
Hardy-Weinberg Law
Hardy-Weinberg LawHardy-Weinberg Law
Hardy-Weinberg Law
 
Gene pool and h w law
Gene pool and h w lawGene pool and h w law
Gene pool and h w law
 
Fisher theory
Fisher theoryFisher theory
Fisher theory
 
Sheep and goat breeds of pakistan
Sheep and goat breeds of pakistanSheep and goat breeds of pakistan
Sheep and goat breeds of pakistan
 
1 gpb 621 quantitative genetics introduction
1 gpb 621 quantitative genetics   introduction1 gpb 621 quantitative genetics   introduction
1 gpb 621 quantitative genetics introduction
 
Population Genetics
Population GeneticsPopulation Genetics
Population Genetics
 

Similar to Bayesian phylogenetic inference_big4_ws_2016-10-10

Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big DataChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationChristian Robert
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...Alexander Decker
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
Inria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCInria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCStéphanie Roger
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Asma Ben Slimene
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Asma Ben Slimene
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 
Yandex wg-talk
Yandex wg-talkYandex wg-talk
Yandex wg-talkYandex
 

Similar to Bayesian phylogenetic inference_big4_ws_2016-10-10 (20)

Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
Presentation.pdf
Presentation.pdfPresentation.pdf
Presentation.pdf
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Inria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCInria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCC
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...
 
Input analysis
Input analysisInput analysis
Input analysis
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
Yandex wg-talk
Yandex wg-talkYandex wg-talk
Yandex wg-talk
 

Recently uploaded

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 

Recently uploaded (20)

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 

Bayesian phylogenetic inference_big4_ws_2016-10-10

  • 1. Bayesian Phylogenetic Inference: Introduction Fred(rik) Ronquist Swedish Museum of Natural History, Stockholm, Sweden BIG 4 Workshop, October 9-19, Tovetorp, Sweden
  • 2. Topics  Probability 101  Bayesian Phylogenetic Inference  Markov chain Monte Carlo  Bayesian Model Choice and Model Averaging
  • 4. Do not worry about your difficulties in Mathematics. I can assure you that mine are still greater. Albert Einstein
  • 5.
  • 6. α Uniform distribution β γ δ ε Beta distribution Gamma distribution Dirichlet distribution Exponential distribution Continuous probability distributions
  • 7. DiscreteProbability Sample spaceω1 ω2 ω3 ω4 ω5 ω6 E = ω1,ω3,ω5{ } Event (subset of outcomes; e.g., face with odd number) Distribution function (E.g., uniform) m(ω) Pr(E) = m(ω) ω ∈Ε ∑ Probability Random variable X
  • 8. ContinuousProbability Sample space (an interval) Disc with circumference 1 f (x) Probability density function (pdf) (e.g. Uniform(0,1)) Pr(E) = f (x) x∈E ∫ dx Probability E = a,b[ ) Event (a subspace of the sample space) a b Random variable X
  • 9. ProbabilityDistributions • Uniform distribution • Beta distribution • Gamma distribution • Dirichlet distribution • Exponential distribution • Normal distribution • Lognormal distribution • Multivariate normal distribution Continuous Distributions
  • 10. ProbabilityDistributions • Bernoulli distribution • Categorical distribution • Binomial distribution • Multinomial distribution • Poisson distribution Discrete Distributions
  • 11. StochasticProcesses • Markov chain • Poisson process • Birth-death process • Coalescence • Dirichlet Process Mixture Stochastic Processes
  • 12.
  • 13. ExponentialDistribution f (x) = λe−λx Mean: 1/λ λ = rate (of decay) Exp(λ)X ~ Parameters: Probability density function: Exponential distribution
  • 14. GammaDistribution f (x) µ xα−1 e−βx Mean: α /β α = shape Gamma(α,β)X ~ Parameters: Probability density function: Gamma distribution β = inverse scale Scaled gamma: α = β Scaled Gamma
  • 15. BetaDistribution f (x) µ xα1 −1 (1− x)α2 −1 Mode: α1 −1 αi −1( ) i ∑ α1,α2 = shape parameters Beta(α1,α2)X ~ Parameters: Probability density function: Beta distribution Defined on two proportions of a whole (a simplex)
  • 16. DirichletDistribution f (x) µ xi αi −1 i ∏ α = vector of k shape parameters Dir(α) :α = α1,α2,...,αk{ }X ~ Parameters: Probability density function: Dirichlet distribution Defined on k proportions of a whole Dir(1,1,1,1) Dir(300,300,300,300)
  • 17. ConditionalProbability Pr(H) = 4 10 = 0.4 Pr(D) = 3 10 = 0.3 Pr(D,H) = 2 10 = 0.2Joint probability: Pr(D | H) = 2 4 = 0.5Conditional probability:
  • 19. Bayes’Rule Pr(D,H) = Pr(D)Pr(H | D) = Pr(H)Pr(D | H) Pr(D)Pr(H | D) = Pr(H)Pr(D | H) Bayes’ rulePr(H | D) = Pr(H)Pr(D | H) Pr(D) = 3 10 × 2 3 = 2 10 = 0.2 = 4 10 × 2 4 = 2 10 = 0.2
  • 20. MaximumLikelihood Pr(D |θ) Maximum Likelihood Inference Data D; Model M with parameters θ We can calculate or f (D |θ) Maximum likelihood: find the value of θ that maximizes L(θ) Define the likelihood function L(θ)µ f (D |θ) Confidence: asymptotic behavior, more samples, bootstrapping
  • 21. BayesianInference Pr(D |θ) Bayesian Inference Data D; Model M with parameters θ We can calculate or f (D |θ) We are actually interested in Pr(θ | D) or f (θ | D) Bayes’ rule: f (θ | D) = f (θ) f (D |θ) f (D) Posterior density Prior density ”Likelihood” Normalizing constant Marginal likelihood of the data Model likelihood f (D) = f (θ)∫ f (D |θ) dθ
  • 23.
  • 24. What is the probability of your favorite team winning the next ice hockey World Championships?
  • 25. Gold Silver Bronze 2012 2013 2014 World Championship Medalists other 2 5 8 1 5 5 3 1 Medals 2011 2007 2008 2009 2010 2015 2016
  • 26. other 2 5 8 1 5 5 3 1 Prior ( )f θ other 2 0 8 0 0 5 3 0 Posterior 1 1( | )f Dθ other 0 0 0 0 0 5 0 0 Posterior 2 2( | )f Dθ other in out in out out in in out Data 1 other out out out out out won out out Data 2 ( )f θ 1 2( | )f D Dθ +
  • 27. Learn more: • Wikipedia (good texts on most statistical distributions, sometimes a little difficult) • Grinstead & Snell: Introduction to Probability. American Mathematical Society. Free pdf available from: http://www.dartmouth.edu/~chance/teaching_aids/books_art • Team up with a statistician or a computational / theoretical evolutionary biologist!
  • 29. Infer relationships among three species: Outgroup:
  • 30. Three possible trees (topologies): A B C
  • 31. A B C Prior distribution probability 1.0 Posterior distribution probability 1.0 Data (observations) Model
  • 32. D The data Taxon Characters A ACG TTA TTA AAT TGT CCT CTT TTC AGA B ACG TGT TTC GAT CGT CCT CTT TTC AGA C ACG TGT TTA GAC CGA CCT CGG TTA AGG D ACA GGA TTA GAT CGT CCG CTT TTC AGA
  • 33. Model: topology AND branch lengths θ Parameters topology )(τ branch lengths )( iv A B 3v C D 2v 1v 4v 5v (expected amount of change) ),( vτθ =
  • 34. Model: molecular evolution θ Parameters instantaneous rate matrix (Jukes-Cantor) Q = [A] [C] [G] [T] [A] − µ µ µ [C] µ − µ µ [G] µ µ − µ [T] µ µ µ −          ÷ ÷ ÷ ÷ ÷÷
  • 35. Model: molecular evolution Probabilities are calculated using the transition probability matrix P 4 /3 4 /3 1 1 4 4( ) 1 3 4 4 v Qv v e P v e e − −  − = =   +  (change) (no change)
  • 36. Priors on parameters  Topology  all unique topologies have equal probability  Branch lengths  exponential prior (puts more weight on small branch lengths)
  • 37. 4 /3v p k ke− = −v Exp(1)v : Exp(1)v : The effect on data likelihood is most important Jeffrey’s uninformative priors formalize this Branch length Prob. of substitution Scale matters in priors
  • 38. Bayes’ theorem ( ) ( | ) ( | ) ( ) ( | ) d f f D f D f f D θ θ θ θ θ θ = ∫ Posterior distribution Prior distribution ”Likelihood” Normalizing constant D = Data θ = Model parameters
  • 39. tree 1 tree 2 tree 3 θ )|( Xf θ Posterior probability distribution Parameter space Posteriorprobability
  • 40. tree 1 tree 2 tree 3 20% 48% 32% We can focus on any parameter of interest (there are no nuisance parameters) by marginalizing the posterior over the other parameters (integrating out the uncertainty in the other parameters) (Percentages denote marginal probability distribution on trees)
  • 42. tree 1 tree 2 tree 3 always accept accept sometimes  Start at an arbitrary point  Make a small random move  Calculate height ratio (r) of new state to old state:  r > 1 -> new state accepted  r < 1 -> new state accepted with probability r. If new state not accepted, stay in the old state  Go to step 2 Markov chain Monte Carlo The proportion of time the MCMC procedure samples from a particular parameter region is an estimate of that region’s posterior probability density 1 2b 2a 20 % 48 % 32 %
  • 43. Metropolis algorithm Assume that the current state has parameter values θ Consider a move to a state with parameter values θ * The height ratio r is (prior ratio x likelihood ratio) r = f (θ* | D) f (θ | D) = f (θ* ) f (D |θ* )/ f (D) f (θ) f (D |θ)/ f (D) = f (θ* ) f (θ) × f (D |θ* ) f (D |θ)
  • 44. MCMC Sampling Strategies  Great freedom of strategies:  Typically one or a few related parameters changed at a time  You can cycle through parameters systematically or choose randomly  One ”generation” or ”iteration” or ”cycle” can include a single randomly chosen proposal (or move, operator, kernel), one proposal for each parameter, a block of randomly chosen proposals
  • 45.
  • 46.
  • 48. burn-in stationary phase sampled with thinning (rapid mixing essential)
  • 49. Majority rule consensus tree Frequencies represent the posterior probability of the clades Probability of clade being true given data and model
  • 50. Summarizing Trees  Maximum posterior probability tree (MAP tree)  can be difficult to estimate precisely  can have low probability  Majority rule consensus tree  easier to estimate clade probabilities exactly  branch length distributions can be summarized across all trees with the branch  can hide complex topological dependence  branch length distributions can be multimodal  Credible sets of trees  Include trees in order of decreasing probability to obtain, e.g., 95 % credible set  “Median” or “central” tree
  • 51. Adding Model Complexity A B topology General Time Reversible substitution model )(τ ÷ ÷ ÷ ÷ ÷          − − − − = GTGCTCATA GTTCGCAGA CTTCGGACA ATTAGGACC rrr rrr rrr rrr Q πππ πππ πππ πππ 3v C D 2v 1v 4v 5v branch lengths )( iv
  • 53. Priors on Parameters  Stationary state frequencies  Flat Dirichlet, Dir(1,1,1,1)  Exchangeability parameters  Flat Dirichlet, Dir(1,1,1,1,1,1)  Shape parameter of scaled gamma distribution of rate variation across sites  Uniform Uni(0,50)
  • 54. Mean and 95% credibility interval for model parameters
  • 55. Summarizing Variables  Mean, median, variance common summaries  95 % credible interval: discard the lowest 2.5 % and highest 2.5 % of sampled values  95 % region of highest posterior density (HPD): find smallest region containing 95 % of probability
  • 57. Other Sampling Methods  Gibbs sampling: sample from the conditional posterior (a variant of the Metropolis algorithm)  Metropolized Gibbs sampling: more efficient variant of Gibbs sampling of discrete characters  Slice sampling: less prone to get stuck in local optima than the Metropolis algorithm  Hamiltonian sampling. A technique for decreasing the problem with sampling correlated parameters.  Simulated annealing: increase ”greediness” during the burn-in phase of MCMC sampling  Data augmentation techniques: add parameters to facilitate probability calculations  Sequential Monte Carlo techniques: generate a sample of complete state by building sets of particles from incomplete states
  • 58. Sequential Monte Carlo Algorithm for Phylogenetics Bouchard et al. 2012. Syst. Biol.
  • 59. 3. Markov chain Monte Carlo
  • 60. Convergence and Mixing  Convergence is the degree to which the chain has converged onto the target distribution  Mixing is the speed with which the chain covers the region of interest in the target distribution
  • 61. Assessing Convergence  Plateau in the trace plot  Look at sampling behavior within the run (autocorrelation time, effective sample size)  Compare independent runs with different, randomly chosen starting points
  • 62. Convergence within Run t =1+ 2 ρk (θ) k=1 ∞ ∑Autocorrelation time where ρk(θ) is the autocorrelation in the MCMC samples for a lag of k generations Effective sample size (ESS) e = n t where n is the total sample size (number of generations) Good mixing when t is small and e large
  • 63. Convergence among Runs  Tree topology:  Compare clade probabilities (split frequencies)  Average standard deviation of split frequencies above some cut-off (min. 10 % in at least one run). Should go to 0 as runs converge.  Continuous variables  Potential scale reduction factor (PSRF). Compares variance within and between runs. Should approach 1 as runs converge.  Assumes overdispersed starting points
  • 66. Sampledvalue Target distribution Too modest proposals Acceptance rate too high Poor mixing Too bold proposals Acceptance rate too low Poor mixing Moderately bold proposals Acceptance rate intermediate Good mixing
  • 67. Tuning Proposals  Manually by changing tuning parameters  Increase the boldness of a proposal if acceptance rate is too high  Decrease the boldness of a proposal if acceptance rate is too low  Auto-tuning  Tuning parameters are adjusted automatically by the MCMC procedure to reach a target proposal rate
  • 68. cold chain heated chain Metropolis- coupled Markov chain Monte Carlo a. k. a. MCMCMC a. k. a. (MC)3
  • 80. Incremental Heating ( )iT λ+= 1/1 { }1,...,1,0 −= ni ( ) ( ) ( ) ( ) 62.0 71.0 83.0 00.1 |62.03 |71.02 |83.01 |00.10 Distr. Xf Xf Xf Xf Ti θ θ θ θ T is temperature, λ is heating coefficient Example for λ = 0.2: cold chain heated chains
  • 83. Models, models, models  Alignment-free models  Heterogeneity in substitution rates and stationary frequencies across sites and lineages  Relaxed clock models  Models for morphology and biogeography  Sampling across model space, e.g. GTR space and partition space  Models of dependence across sites according to 3D structure of proteins  Positive selection models  Aminoacid models  Models for population genetics and phylogeography
  • 84. Bayes’ Rule f (θ | D) = f (θ) f (D |θ) f (θ) f (D |θ) dθ∫ = f (θ) f (D |θ) f (D) Marginal likelihood (of the data) f (θ | D,M) = f (θ | M) f (D |θ,M) f (D | M) We have implicitly conditioned on a model:
  • 85. Bayesian Model Choice f M1( ) f D | M1( ) f M0( ) f D | M0( ) Posterior model odds: B10 = f D | M1( ) f D | M0( ) Bayes factor:
  • 86. Bayesian Model Choice  The normalizing constant in Bayes’ theorem, the marginal likelihood of the data, f(D) or f(D|M), can be used for model choice  f(D|M) can be estimated by taking the harmonic mean of the likelihood values from the MCMC run. Thermodynamic integration and stepping-stone sampling are computationally more complex but more accurate methods  Any models can be compared: nested, non-nested, data-derived; it is just a probability comparison  No correction for number of parameters  Can prefer a simpler model over a more complex model  Critical values in Kass and Raftery (1997)
  • 87. Simple Model Wins (from Lewis, 2008)
  • 88. Bayes Factor Comparisons Interpretation of the Bayes factor 2ln(B10) B10 Evidence against M0 0 to 2 1 to 3 Not worth more than a bare mention 2 to 6 3 to 20 Positive 6 to 10 20 to 150 Strong > 10 > 150 Very strong
  • 89. Bayesian Software  Model testing  ModelTest  MrModelTest  MrAIC  Convergence diagnostics  AWTY  Tracer  Phylogenetic inference  MrBayes  BEAST  BayesPhylogenies  PhyloBayes  Phycas  BAMBE  RevBayes  Specialized inference  PHASE  BAliPhy  BayesTraits  Badger  BEST  *BEAST  CoEvol  Tree drawing  TreeView  FigTree
  • 90. Listening to lectures, after a certain age, diverts the mind too much from its creative pursuits. Any scientist who attends too many lectures and uses her own brain too little falls into lazy habits of thinking. after Albert Einstein