This document presents a new method called the "Beta with spikes" approach for modeling allele frequency data under the Wright-Fisher model of genetic drift, mutation, and selection over time. The method uses a recursive calculation of mean and variance to approximate allele frequency distributions as a Beta distribution with additional point masses accounting for loss and fixation. It provides a consistent approximation compared to other methods like diffusion approximations. The Beta with spikes approach can be used to infer population genetic parameters and split times from DNA sequence data.
It is hard to live without generating and using data. Bayesian Data Analysis builds upon basic probability theory and gives us the necessary tools to deal with realistic data. In this session, I will make no assumptions on your knowledge of probability or random processes. Using the very popular case of coin tosses, I introduce the concepts of Prior, Likelihood and Posteriors. This gives me the platform to discuss Bayes Rule and Bernoulli Distribution. Further, the algebraic limitation of such an analysis leads us to appreciate the purpose and properties of Beta Distributions. If you have heard of "Weak Priors", "Strong Priors" and "Conjugates", but never quite got a chance to study them, you will find this talk very interesting. Ever since my joining PhD@IIIT, it has been a journey through Randomized Algorithms, Information Retrieval, Intelligent Systems and Statistical Computation - all of which deal with related perspectives on probability theory. General wisdom claims that discussing fundamentals is relatively harder than discussing advanced stuff. I look forward for this discussion and hope that you will find value in it.
It is hard to live without generating and using data. Bayesian Data Analysis builds upon basic probability theory and gives us the necessary tools to deal with realistic data. In this session, I will make no assumptions on your knowledge of probability or random processes. Using the very popular case of coin tosses, I introduce the concepts of Prior, Likelihood and Posteriors. This gives me the platform to discuss Bayes Rule and Bernoulli Distribution. Further, the algebraic limitation of such an analysis leads us to appreciate the purpose and properties of Beta Distributions. If you have heard of "Weak Priors", "Strong Priors" and "Conjugates", but never quite got a chance to study them, you will find this talk very interesting. Ever since my joining PhD@IIIT, it has been a journey through Randomized Algorithms, Information Retrieval, Intelligent Systems and Statistical Computation - all of which deal with related perspectives on probability theory. General wisdom claims that discussing fundamentals is relatively harder than discussing advanced stuff. I look forward for this discussion and hope that you will find value in it.
1. Betaspikes
The Beta distribution approach
PAULA TATARU
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Aarhus, October 23rd 2014
Modelling allele frequency data under the
Wright Fisher model of drift, mutation and selection
Joint work with Thomas Bataillon and Asger Hobolth
2. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Motivation
›Inference population parameters from DNA data
› mutation rates
› selection coefficients
› split times
› variable population size back in time
›Backward in time (coalescent)
›Forward in time (Wright Fisher)
2
3. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
3
The Wright Fisher model: Drift only
4. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
4
The Wright Fisher model: Mutations
5. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
5
The Wright Fisher model: Selection
6. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Allele frequency distribution: Drift only
6
7. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
›Diffusion
› Kimura 1964
› Gautier & Vitalis 2013
› Malaspinas et al. 2012
› Steinrucken et al. 2013
› Zhao et al. 2013
›Moment based
› Normal distribution
› Nicholson et al. 2002
› Prickrell & Pritchard 2012
› Beta distribution
› Balding & Nichols 1995
› Siren et al. 2011
› Beta with spikes
7
Approximations to the Wright Fisher
8. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
8
The Beta approximation: Main idea
›The density of Xt
›Use recursive approach to calculate
› mean and variance
9. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
9
The Beta approximation: Drift only
10. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
10
The Beta approximation: Drift only
11. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
11
The Beta approximation: Drift only
12. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
The Beta with spikes: Main idea
›The density of Xt
›Use recursive approach to calculate
› mean and variance
› loss and fixation probabilities
› mean and variance conditional on polymorphism
12
13. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Approximations: Drift only
13
14. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
14
Approximations: Drift only
15. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
15
The Beta with spikes: Drift only / Selection
16. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
16
The Beta with spikes: Drift only / Selection
17. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
17
The Beta with spikes: Drift only / Selection
18. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
18
Inference of split times: Drift only
›Felsenstein’s peeling algorithm
›Numerically optimized likelihood
›5000 independent loci
›100 samples in each population
›40 data sets
19. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Inference of split times: Drift only
19
20. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
Conclusions
›Beta with spikes: new approximation to the WF
› Quality of approximation
› Consistent
› Diffusion > Beta with spikes > Beta
› Simple mathematical formulation -> decrease in speed
› Inference of split times
› Beta with spikes ~ Kim Tree
20
21. Allele frequencies: the Beta distribution approach
Paula Tataru paula@birc.au.dk
AARHUS
UNIVERSITY
Bioinformatics
Research Centre
21
Loss and fixation probabilities