Allele Frequencies as Stochastic Processes: Mathematical & Statistical Approaches.

Allele frequencies as Stochastic Processes
Mathematical and Statistical Approaches

Gota Morota

Nov 30, 2010

1 / 32
Outline

Change of Allele Frequencies as Stochastic Processes

Steady State Distributions of Allele Frequencies

Time Series Analysis

2 / 32
Outline

Change of Allele Frequencies as Stochastic Processes

Steady State Distributions of Allele Frequencies

Time Series Analysis

3 / 32
Outline

Change of Allele Frequencies as Stochastic Processes

Steady State Distributions of Allele Frequencies

Time Series Analysis

4 / 32
Various factors affecting allele frequencies

• Selection, mutation and migration (cross breedings) ⇒
systematic pressures (Wright 1949)
• Random fluctuations
1. Random sampling of gametes (genetic drift)
2. Random fluctuation in systematic pressures

⇓
Allele frequencies are funcions of the systematic forces and the
random components

5 / 32
Random walk ⇒ Brownian Motion
0.10
−0.010

0.05
−0.015
0.00
−0.020
−0.05

−0.025
−0.10

−0.030

−0.15

−0.20

−0.035

−0.25
−0.040
200
2

4

6

8

400

10

600

800

1000

Time

Time

Figure 3: Time = [1:1000]

Figure 1: Time = [1,10]

0.8
−0.02

0.6
−0.04
0.4

−0.06
0.2

−0.08
0.0

−0.10

−0.2

2000
20

40

60

80

100

Time

Figure 2: Time = [1:100]

4000

6000

8000

10000

Time

Figure 4: Time = [1:10000]
6 / 32
Brownian Motion ⇒ Diffusion Model

0.8

0.6

+ conditional on
forces

0.4

0.2

Systematic

0.0

−0.2

2000

4000

6000

8000

10000

Time

Figure 5: Time = [1:10000]

• treat change of allele frequencies as stochastic porcess

⇓

Diffusion Model

7 / 32
Diffusion Model

Allele Frequency

It frames infinite number of paths that allele fequencies would take
over time under certain systematic pressures.

0

2000

4000

6000

8000

10000

6000

8000

10000

6000

8000

10000

Allele Frequency

Time

0

2000

4000

Allele Frequency

Time

0

2000

4000
Time

• pick up single time
point t (say 5000 in
above)
• try to find PDF at
point t

• need to solve partial differntial
equation (PDE)
• Fokker-Planck Equation!

8 / 32
Fokker-Planck Equation
• Derived from a continuous time stochastic process (X)
• Partial differential equation

∂
∂φ(p , x ; t ) 1 ∂2
{Vδx φ(p , x ; t )} −
=
{Mδx φ(p , x ; t )}
2
∂t
2 ∂x
∂x

(1)

where
• p: initial allele frequency (fixed)
• x: allele frequency (random variable)
• t: time (continuous variable)
• φ(p , x ; t ): PDF
• Vδx : variance of δx (amount of change in allele frequency per
time)
• Mδx : mean of δx (amount of change in allele frequency per
time)
• Vδx and Mδx : both may depend on x and t
9 / 32
Fokker-Planck Equation for Brownian Motion
A standard Brownian motion can be constructed from random walk
with error having mean 0 and variance 1 under right scaling. It has
the PDF of N(0, t).
• when t = 1.0, N(0, 1)
• when t = 1.5, N(0, 1.5)

Fokker-Planck equation:

∂φ(p , x ; t ) 1 ∂2
=
φ(p , x ; t )
∂t
2 ∂x 2
= Heat equation

(2)
(3)

Mδx = 0 and Vδx = 1 in equation (1)
Solution:

φ(p .x ; t ) = √

1

2πt

exp

−x 2
2t

(4)
10 / 32
Solution of the Heat Equation (the Heat Kernel)

t = 0.00001
t = 0.01
t=0.1
t=1
t=10

−2

−1

0

1

2

x

11 / 32
Under Random Genetic Drift
Mδx = 0

Vδx =

x (1 − x )
2Ne

Fokker-Planck equation for random genetic drift:

∂φ(p , x ; t )
1 ∂2
x (1 − x )φ(p , x ; t )
=
∂t
4Ne ∂x 2

(5)

Solutions are obtained as infinite series of sum by...
• Kimura (1955) Hypergeometric function
• Korn and Korn (1968) Gegenbauer polynomial

φ = 6p (1 − p )exp

−1
2Ne

t + 30p (1 − p )(1 − 2p )(1 − 2x )

−3
2Ne

t + ··· ,

12 / 32
Solution of FPE (Kimura 1955)
VOL. 41) 1955

GENETICS: MOTOO KIMURA

149

FIGS. 1-2.-The processes of the change in the probability distribution of heterallelic classes,
due to random sampling of gametes in reproduction. It is assumed that the population starts
from the gene frequency 0.5 in Fig. 1 (left) and 0.1 in Fig. 2 (right). t = time in generation; N = effective size of the population; abscissa is gene frequency; ordinate is probability
density.
13 / 32
Under Selection and Random Genetic Drift
Mδx = sx (1 − x )

Vδx =

x (1 − x )
2Ne

∂
1 ∂2
∂φ(p , x ; t )
x (1 − x )φ(p , x ; t ) − s x (1 − x )φ(p , x ; t ) (6)
=
∂t
4Ne ∂x 2
∂x
Solutions are obtained as infinite series using oblate spheroidal
equation using transformaton of allele frequencies (z = 1-2x)
• Kimura (1955)
• Kimura and Crow (1956)
∞
(1)

φ(p , x , t ) =
k =0

Ck exp (−λk t + 2cx )V1k (z )

(7)

where
(1)

V1k (z ) =

k 1
fn Tn (z )
n=0,1
14 / 32
Kolmogorov Backward Equation
• Derived from a continuous time stochastic process (P)
• Partial differential equation

∂
∂2
∂φ(p , x ; t ) 1
= Vδp 2 φ(p , x ; t ) + Mδp φ(p , x ; t )
∂t
2
∂p
∂p

(8)

where
• p: initial allele frequency (random variable)
• x: allele frequency (random variable except x in the time t is
fixed)
• t: time (continuous variable)
• φ(p , x ; t ): PDF
• Vδp : variance of δp (amount of change in allele frequency)
• Mδp : mean of δp (amount of change in allele frequency)
• Vδp and Mδp : both may depend on x but not on t (time
homogeneous)
15 / 32
Steady State Distribution of Allele Frequencies
Equilibrium
• single point (balance between various forces that keep allele
frequecies near equilibrium )
• PDF

⇓
PDF of stable equilibrium instead of single point
Steady state allele frequency distribution
• Fisher (1922), (1930)
• Wright (1931), (1937), (1938)

φ(p , x ; t ) = solution of a fokker-planck equation
lim φ(p , x ; t ) = φ(x )

(10)

t →∞

φ(x ) =

C
exp (2
V δx

(9)

M δx
dx )
Vδx

(11)
16 / 32
Steady State Distribution – Random Genetic Drift

For a large value of t, only the first few terms have impact on
determining the actual form of the PDF.

φ = 6p (1 − p )exp

−t
2Ne

+ 30p (1 − p )(1 − 2p )(1 − 2x )

−3t
2Ne

+ ··· ,

Asymptotic formula:
lim φ = C · exp

t →∞

−1
2Ne

t

17 / 32
is large can be found directly from the Poisson series according to which
the chance of drawing 0 where m is the mean number in a sample i s r m .
The contribution to the 0 class will thus be (e-1+e-2+e-3 . . .)f =
e-l
f , = 0.582f.
1-e-l

Graphical Representation (Wright 1931)

T

25%

50%

754,

Factor Frequ e nc y

FIGURE
3.-Distribution of gene frequencies in an isolated population in which fixation and
loss of genes each is proceeding at the rate 1/4N in the absence of appreciable selection or muta-

18 / 32
Steady State Distribution – Selection and Mutation

Mδx = −ux + v (1 − x ) +

¯
x (1 − x ) d a
2
dx

Vδx =

x (1 − x )
2Ne

¯
φ(x ) = C · exp (2Ne a )x 4Ne v −1 (1 − x )4Ne u−1

(12)

When A has selecive advantage s over a:

¯
a = 2sx 2 + s2x (1 − x ) + 0 ∗ (1 − x 2 )
= 2sx
φ(x ) = C · exp (4Ne sx )x 4Ne v −1 (1 − x )4Ne u−1

(13)

19 / 32
Graphical Representation (Wright 1937)
GENETICS: S. WRIGHT
308

PROC. N. A. S.

Fig.l

Fig 4

Fi9.2

Fig. 5

Fig. 6
20 / 32
Time Series Analysis

When variable is measured sequentially in time resulting data form
a time series.
• Diffusion Model – Continuous time stochastic process
• Time Series – Discrete time stochastic process

21 / 32
Basic Models
Observations close together in time tend to be correlated
• Autoregressive Model: AR(p)
p

Xt = c +

ψi Xt −i +

t

(14)

i =1

• Moving Average Model: MA(q)
q

Xt = c +

θi

t −i

+

t

(15)

i =1

• Autoregressive Moving Average Model: ARMA (p, q)

Xt = AR(p) + MA(q)

(16)

22 / 32
Time Series as a Polynomial Equation
B k Xt = Xt −k (back shift operator)
• AR(p)

Xt = ψ1 Xt −1 + · · · + ψp Xt −p
Xt = (ψ1 B + · · · + ψp B p )Xt

(1 − ψ1 B − · · · − ψp B p )Xt = 0
• ARMA(p,q)

Xt − ψ1 Xt −1 − · · · − ψp Xt −p =

t

+ θ1

t −1

+ · · · + θq

t −q

(1 − ψ1 B − · · · − ψp B )Xt = (1 + θ1 B + · · · + θq B q )
p

t

23 / 32
Stationary Process
The mean and variance do not change over time. No trend.
Not stationary

Looks like stationary
10

0.8

0.6

5

0.4
0
0.2

−5

0.0

−0.2
−10
2000

4000

6000

8000

10000

2000

4000

6000

8000

Time

Figure 6: Random Walk

10000

Time

Figure 7: Detrended

Detrending:
• linear regression
• take a difference
• Autoregressive Integrated Moving Average: ARIMA(p,d,q)
24 / 32
Application on Allele Frequencies
• Influential SNPs – indicative of deterministic trends
• Uninfluential SNPs – random fluctuation?
• Diffusion Model – assumed Markovian process
• Time Series – which model describes the process of change
of allele frequencies

Application
• Objective: model process of change of allele freqeuncies
• Data: SNPs genotypes of 4,798 Holstein bulls with 38,416
markers and milk yield
• Genotype inputation: FastPhase 1.4
• Estimation of marker effects: BayesCπ

25 / 32
BayesCπ

Analysis of human mini-exome sequencing data using a Bayesian hierarchical mixture
model: Genetic Analysis Workshop 17
Bueno Filho JS1,2∗ , Morota G1∗ , Tran QT3 , Maenner MJ4 , Vera-Cala LM4,5 , Engelman CD4§ , and Meyers KJ4§
Department of Dairy Science, University of Wisconsin-Madison, USA
Departamento de Ciˆncias Exatas, Universidade Federal de Lavras, Brasil
e
3
Department of Statistics, University of Wisconsin-Madison, USA
4
Department of Population Health Sciences, University of Wisconsin-Madison, USA
5
Departamento de Salud Publica, Universidad Industrial de Santander, Colombia

1
2

∗
§

Contributed equally to this work
Corresponding author

Email addresses:
JSB: jssbueno@dex.ufla.br
Figure
GM: morota@wisc.edu
QTT: tran@stat.wisc.edu
MJM: maenner@waisman.wisc.edu
LMV: veracala@wisc.edu
CDE: cengelman@wisc.edu
KJM: kjmeyers2@wisc.edu

8: GAW17

26 / 32
Allele Frequency of the Top Marker

0.8
0.6
0.4

Allele Frequency

Original

0

5

10

15

20

25

30

25

30

Time

0.15
0.00
−0.15

Allele Frequency

Detrended

5

10

15

20

Time

Figure 9: Time plots of allele frequencies. Top: Original series. Bottom:
Smoothed by taking the first order difference.
27 / 32
Autocorrelation and Partial Autocorrelation
ARIMA(1,1,1)?
Original series

0.2

−0.4

−0.2

0.0

Partial ACF

0.4
0.0

ACF

0.8

0.4

Original series

0

2

4

6

8

10

12

14

2

4

6

8

10

12

First order difference series

14

Lag

First order ifference series

0.2
0.0

Partial ACF

−0.4

−0.2

0.4
0.0
−0.4

ACF

0.8

0.4

Lag

0

2

4

6

8
Lag

10

12

14

2

4

6

8

10

12

14

Lag

Figure 10: ACF and PACF
28 / 32
Model Selection

Table 1: Comparison of several competitive models

Model
ARIMA (1,0,0)
ARIMA (0,1,0)
ARIMA (0,0,1)

AIC
-51.56
-49.38
-46.41

Model
ARIMA (1,1,0)
ARIMA (1,0,1)
ARIMA (1,1,1)

AIC
-52.47
-51.13
-51.02

ARIMA(1,1,0)
Xt = 0.635Xt −1 +

t

29 / 32
Advanced Models

Time dependent variance
• ARCH (Autoregressive Conditional Heteroskedasticity)
• GARCH (Generalized Autoregressive Conditional
Heteroskedasticity)

Multivariate
• VARMA (Vector Autoregression Moving Average)
• BVARMA (Bayesian Vector Autoregression Moving Average)

30 / 32
Intersection of Mathematics and Statistics

Under certain condition
GARCH(1,1) ≈ Diffusion Model!

31 / 32
Thank you!

32 / 32
1 of 32

Recommended

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,... by
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...The Statistical and Applied Mathematical Sciences Institute
277 views39 slides
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R... by
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Chiheb Ben Hammouda
28 views43 slides
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,... by
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...The Statistical and Applied Mathematical Sciences Institute
364 views72 slides
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and... by
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...The Statistical and Applied Mathematical Sciences Institute
342 views39 slides
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie... by
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...The Statistical and Applied Mathematical Sciences Institute
432 views24 slides
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,... by
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...The Statistical and Applied Mathematical Sciences Institute
345 views35 slides

More Related Content

What's hot

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,... by
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...The Statistical and Applied Mathematical Sciences Institute
306 views44 slides
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,... by
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...The Statistical and Applied Mathematical Sciences Institute
438 views63 slides
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ... by
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...The Statistical and Applied Mathematical Sciences Institute
107 views24 slides
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F... by
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...The Statistical and Applied Mathematical Sciences Institute
440 views93 slides
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,... by
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...The Statistical and Applied Mathematical Sciences Institute
313 views32 slides
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie... by
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...The Statistical and Applied Mathematical Sciences Institute
386 views26 slides

What's hot(20)

2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat... by NUI Galway
2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat...2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat...
NUI Galway1.8K views
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat... by NUI Galway
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
NUI Galway2.4K views
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat... by NUI Galway
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
NUI Galway3K views
Sampling strategies for Sequential Monte Carlo (SMC) methods by Stephane Senecal
Sampling strategies for Sequential Monte Carlo (SMC) methodsSampling strategies for Sequential Monte Carlo (SMC) methods
Sampling strategies for Sequential Monte Carlo (SMC) methods
Stephane Senecal927 views
Random Matrix Theory and Machine Learning - Part 4 by Fabian Pedregosa
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
Fabian Pedregosa21.9K views
A lambda calculus for density matrices with classical and probabilistic controls by Alejandro Díaz-Caro
A lambda calculus for density matrices with classical and probabilistic controlsA lambda calculus for density matrices with classical and probabilistic controls
A lambda calculus for density matrices with classical and probabilistic controls

Similar to Allele Frequencies as Stochastic Processes: Mathematical & Statistical Approaches.

Looking Inside Mechanistic Models of Carcinogenesis by
Looking Inside Mechanistic Models of CarcinogenesisLooking Inside Mechanistic Models of Carcinogenesis
Looking Inside Mechanistic Models of CarcinogenesisSascha Zöllner
365 views51 slides
extreme times in finance heston model.ppt by
extreme times in finance heston model.pptextreme times in finance heston model.ppt
extreme times in finance heston model.pptArounaGanou2
16 views40 slides
main by
mainmain
mainDavid Mateos
194 views75 slides
Sequential Monte Carlo algorithms for agent-based models of disease transmission by
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionJeremyHeng10
62 views57 slides
Computational Information Geometry on Matrix Manifolds (ICTP 2013) by
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Frank Nielsen
741 views56 slides
Frequency14.pptx by
Frequency14.pptxFrequency14.pptx
Frequency14.pptxMewadaHiren
10 views16 slides

Similar to Allele Frequencies as Stochastic Processes: Mathematical & Statistical Approaches. (20)

Looking Inside Mechanistic Models of Carcinogenesis by Sascha Zöllner
Looking Inside Mechanistic Models of CarcinogenesisLooking Inside Mechanistic Models of Carcinogenesis
Looking Inside Mechanistic Models of Carcinogenesis
Sascha Zöllner365 views
extreme times in finance heston model.ppt by ArounaGanou2
extreme times in finance heston model.pptextreme times in finance heston model.ppt
extreme times in finance heston model.ppt
ArounaGanou216 views
Sequential Monte Carlo algorithms for agent-based models of disease transmission by JeremyHeng10
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmission
JeremyHeng1062 views
Computational Information Geometry on Matrix Manifolds (ICTP 2013) by Frank Nielsen
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Frank Nielsen741 views
Ray : modeling dynamic systems by Houw Liong The
Ray : modeling dynamic systemsRay : modeling dynamic systems
Ray : modeling dynamic systems
Houw Liong The936 views
Talk in BayesComp 2018 by JeremyHeng10
Talk in BayesComp 2018Talk in BayesComp 2018
Talk in BayesComp 2018
JeremyHeng10173 views
Formulas statistics by Prashi_Jain
Formulas statisticsFormulas statistics
Formulas statistics
Prashi_Jain2K views
Controlled sequential Monte Carlo by JeremyHeng10
Controlled sequential Monte Carlo Controlled sequential Monte Carlo
Controlled sequential Monte Carlo
JeremyHeng1090 views
Seismic data processing lecture 3 by Amin khalil
Seismic data processing lecture 3Seismic data processing lecture 3
Seismic data processing lecture 3
Amin khalil2.5K views
Sequential Monte Carlo algorithms for agent-based models of disease transmission by JeremyHeng10
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmission
JeremyHeng1038 views
Low rank tensor approximation of probability density and characteristic funct... by Alexander Litvinenko
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...
The Multivariate Gaussian Probability Distribution by Pedro222284
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability Distribution
Pedro222284161 views
Lecture: Monte Carlo Methods by Frank Kienle
Lecture: Monte Carlo MethodsLecture: Monte Carlo Methods
Lecture: Monte Carlo Methods
Frank Kienle1.7K views
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi... by Alexander Litvinenko
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
A series of maximum entropy upper bounds of the differential entropy by Frank Nielsen
A series of maximum entropy upper bounds of the differential entropyA series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropy
Frank Nielsen236 views

Recently uploaded

Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveNetwork Automation Forum
43 views35 slides
Info Session November 2023.pdf by
Info Session November 2023.pdfInfo Session November 2023.pdf
Info Session November 2023.pdfAleksandraKoprivica4
15 views15 slides
Kyo - Functional Scala 2023.pdf by
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfFlavio W. Brasil
418 views92 slides
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdfDr. Jimmy Schwarzkopf
24 views29 slides
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensorssugiuralab
23 views15 slides
The Forbidden VPN Secrets.pdf by
The Forbidden VPN Secrets.pdfThe Forbidden VPN Secrets.pdf
The Forbidden VPN Secrets.pdfMariam Shaba
20 views72 slides

Recently uploaded(20)

Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab23 views
The Forbidden VPN Secrets.pdf by Mariam Shaba
The Forbidden VPN Secrets.pdfThe Forbidden VPN Secrets.pdf
The Forbidden VPN Secrets.pdf
Mariam Shaba20 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays33 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2218 views
HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn26 views
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe by Simone Puorto
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe
Simone Puorto13 views
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10345 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software317 views
"Running students' code in isolation. The hard way", Yurii Holiuk by Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays24 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker48 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc72 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...

Allele Frequencies as Stochastic Processes: Mathematical & Statistical Approaches.

  • 1. Allele frequencies as Stochastic Processes Mathematical and Statistical Approaches Gota Morota Nov 30, 2010 1 / 32
  • 2. Outline Change of Allele Frequencies as Stochastic Processes Steady State Distributions of Allele Frequencies Time Series Analysis 2 / 32
  • 3. Outline Change of Allele Frequencies as Stochastic Processes Steady State Distributions of Allele Frequencies Time Series Analysis 3 / 32
  • 4. Outline Change of Allele Frequencies as Stochastic Processes Steady State Distributions of Allele Frequencies Time Series Analysis 4 / 32
  • 5. Various factors affecting allele frequencies • Selection, mutation and migration (cross breedings) ⇒ systematic pressures (Wright 1949) • Random fluctuations 1. Random sampling of gametes (genetic drift) 2. Random fluctuation in systematic pressures ⇓ Allele frequencies are funcions of the systematic forces and the random components 5 / 32
  • 6. Random walk ⇒ Brownian Motion 0.10 −0.010 0.05 −0.015 0.00 −0.020 −0.05 −0.025 −0.10 −0.030 −0.15 −0.20 −0.035 −0.25 −0.040 200 2 4 6 8 400 10 600 800 1000 Time Time Figure 3: Time = [1:1000] Figure 1: Time = [1,10] 0.8 −0.02 0.6 −0.04 0.4 −0.06 0.2 −0.08 0.0 −0.10 −0.2 2000 20 40 60 80 100 Time Figure 2: Time = [1:100] 4000 6000 8000 10000 Time Figure 4: Time = [1:10000] 6 / 32
  • 7. Brownian Motion ⇒ Diffusion Model 0.8 0.6 + conditional on forces 0.4 0.2 Systematic 0.0 −0.2 2000 4000 6000 8000 10000 Time Figure 5: Time = [1:10000] • treat change of allele frequencies as stochastic porcess ⇓ Diffusion Model 7 / 32
  • 8. Diffusion Model Allele Frequency It frames infinite number of paths that allele fequencies would take over time under certain systematic pressures. 0 2000 4000 6000 8000 10000 6000 8000 10000 6000 8000 10000 Allele Frequency Time 0 2000 4000 Allele Frequency Time 0 2000 4000 Time • pick up single time point t (say 5000 in above) • try to find PDF at point t • need to solve partial differntial equation (PDE) • Fokker-Planck Equation! 8 / 32
  • 9. Fokker-Planck Equation • Derived from a continuous time stochastic process (X) • Partial differential equation ∂ ∂φ(p , x ; t ) 1 ∂2 {Vδx φ(p , x ; t )} − = {Mδx φ(p , x ; t )} 2 ∂t 2 ∂x ∂x (1) where • p: initial allele frequency (fixed) • x: allele frequency (random variable) • t: time (continuous variable) • φ(p , x ; t ): PDF • Vδx : variance of δx (amount of change in allele frequency per time) • Mδx : mean of δx (amount of change in allele frequency per time) • Vδx and Mδx : both may depend on x and t 9 / 32
  • 10. Fokker-Planck Equation for Brownian Motion A standard Brownian motion can be constructed from random walk with error having mean 0 and variance 1 under right scaling. It has the PDF of N(0, t). • when t = 1.0, N(0, 1) • when t = 1.5, N(0, 1.5) Fokker-Planck equation: ∂φ(p , x ; t ) 1 ∂2 = φ(p , x ; t ) ∂t 2 ∂x 2 = Heat equation (2) (3) Mδx = 0 and Vδx = 1 in equation (1) Solution: φ(p .x ; t ) = √ 1 2πt exp −x 2 2t (4) 10 / 32
  • 11. Solution of the Heat Equation (the Heat Kernel) t = 0.00001 t = 0.01 t=0.1 t=1 t=10 −2 −1 0 1 2 x 11 / 32
  • 12. Under Random Genetic Drift Mδx = 0 Vδx = x (1 − x ) 2Ne Fokker-Planck equation for random genetic drift: ∂φ(p , x ; t ) 1 ∂2 x (1 − x )φ(p , x ; t ) = ∂t 4Ne ∂x 2 (5) Solutions are obtained as infinite series of sum by... • Kimura (1955) Hypergeometric function • Korn and Korn (1968) Gegenbauer polynomial φ = 6p (1 − p )exp −1 2Ne t + 30p (1 − p )(1 − 2p )(1 − 2x ) −3 2Ne t + ··· , 12 / 32
  • 13. Solution of FPE (Kimura 1955) VOL. 41) 1955 GENETICS: MOTOO KIMURA 149 FIGS. 1-2.-The processes of the change in the probability distribution of heterallelic classes, due to random sampling of gametes in reproduction. It is assumed that the population starts from the gene frequency 0.5 in Fig. 1 (left) and 0.1 in Fig. 2 (right). t = time in generation; N = effective size of the population; abscissa is gene frequency; ordinate is probability density. 13 / 32
  • 14. Under Selection and Random Genetic Drift Mδx = sx (1 − x ) Vδx = x (1 − x ) 2Ne ∂ 1 ∂2 ∂φ(p , x ; t ) x (1 − x )φ(p , x ; t ) − s x (1 − x )φ(p , x ; t ) (6) = ∂t 4Ne ∂x 2 ∂x Solutions are obtained as infinite series using oblate spheroidal equation using transformaton of allele frequencies (z = 1-2x) • Kimura (1955) • Kimura and Crow (1956) ∞ (1) φ(p , x , t ) = k =0 Ck exp (−λk t + 2cx )V1k (z ) (7) where (1) V1k (z ) = k 1 fn Tn (z ) n=0,1 14 / 32
  • 15. Kolmogorov Backward Equation • Derived from a continuous time stochastic process (P) • Partial differential equation ∂ ∂2 ∂φ(p , x ; t ) 1 = Vδp 2 φ(p , x ; t ) + Mδp φ(p , x ; t ) ∂t 2 ∂p ∂p (8) where • p: initial allele frequency (random variable) • x: allele frequency (random variable except x in the time t is fixed) • t: time (continuous variable) • φ(p , x ; t ): PDF • Vδp : variance of δp (amount of change in allele frequency) • Mδp : mean of δp (amount of change in allele frequency) • Vδp and Mδp : both may depend on x but not on t (time homogeneous) 15 / 32
  • 16. Steady State Distribution of Allele Frequencies Equilibrium • single point (balance between various forces that keep allele frequecies near equilibrium ) • PDF ⇓ PDF of stable equilibrium instead of single point Steady state allele frequency distribution • Fisher (1922), (1930) • Wright (1931), (1937), (1938) φ(p , x ; t ) = solution of a fokker-planck equation lim φ(p , x ; t ) = φ(x ) (10) t →∞ φ(x ) = C exp (2 V δx (9) M δx dx ) Vδx (11) 16 / 32
  • 17. Steady State Distribution – Random Genetic Drift For a large value of t, only the first few terms have impact on determining the actual form of the PDF. φ = 6p (1 − p )exp −t 2Ne + 30p (1 − p )(1 − 2p )(1 − 2x ) −3t 2Ne + ··· , Asymptotic formula: lim φ = C · exp t →∞ −1 2Ne t 17 / 32
  • 18. is large can be found directly from the Poisson series according to which the chance of drawing 0 where m is the mean number in a sample i s r m . The contribution to the 0 class will thus be (e-1+e-2+e-3 . . .)f = e-l f , = 0.582f. 1-e-l Graphical Representation (Wright 1931) T 25% 50% 754, Factor Frequ e nc y FIGURE 3.-Distribution of gene frequencies in an isolated population in which fixation and loss of genes each is proceeding at the rate 1/4N in the absence of appreciable selection or muta- 18 / 32
  • 19. Steady State Distribution – Selection and Mutation Mδx = −ux + v (1 − x ) + ¯ x (1 − x ) d a 2 dx Vδx = x (1 − x ) 2Ne ¯ φ(x ) = C · exp (2Ne a )x 4Ne v −1 (1 − x )4Ne u−1 (12) When A has selecive advantage s over a: ¯ a = 2sx 2 + s2x (1 − x ) + 0 ∗ (1 − x 2 ) = 2sx φ(x ) = C · exp (4Ne sx )x 4Ne v −1 (1 − x )4Ne u−1 (13) 19 / 32
  • 20. Graphical Representation (Wright 1937) GENETICS: S. WRIGHT 308 PROC. N. A. S. Fig.l Fig 4 Fi9.2 Fig. 5 Fig. 6 20 / 32
  • 21. Time Series Analysis When variable is measured sequentially in time resulting data form a time series. • Diffusion Model – Continuous time stochastic process • Time Series – Discrete time stochastic process 21 / 32
  • 22. Basic Models Observations close together in time tend to be correlated • Autoregressive Model: AR(p) p Xt = c + ψi Xt −i + t (14) i =1 • Moving Average Model: MA(q) q Xt = c + θi t −i + t (15) i =1 • Autoregressive Moving Average Model: ARMA (p, q) Xt = AR(p) + MA(q) (16) 22 / 32
  • 23. Time Series as a Polynomial Equation B k Xt = Xt −k (back shift operator) • AR(p) Xt = ψ1 Xt −1 + · · · + ψp Xt −p Xt = (ψ1 B + · · · + ψp B p )Xt (1 − ψ1 B − · · · − ψp B p )Xt = 0 • ARMA(p,q) Xt − ψ1 Xt −1 − · · · − ψp Xt −p = t + θ1 t −1 + · · · + θq t −q (1 − ψ1 B − · · · − ψp B )Xt = (1 + θ1 B + · · · + θq B q ) p t 23 / 32
  • 24. Stationary Process The mean and variance do not change over time. No trend. Not stationary Looks like stationary 10 0.8 0.6 5 0.4 0 0.2 −5 0.0 −0.2 −10 2000 4000 6000 8000 10000 2000 4000 6000 8000 Time Figure 6: Random Walk 10000 Time Figure 7: Detrended Detrending: • linear regression • take a difference • Autoregressive Integrated Moving Average: ARIMA(p,d,q) 24 / 32
  • 25. Application on Allele Frequencies • Influential SNPs – indicative of deterministic trends • Uninfluential SNPs – random fluctuation? • Diffusion Model – assumed Markovian process • Time Series – which model describes the process of change of allele frequencies Application • Objective: model process of change of allele freqeuncies • Data: SNPs genotypes of 4,798 Holstein bulls with 38,416 markers and milk yield • Genotype inputation: FastPhase 1.4 • Estimation of marker effects: BayesCπ 25 / 32
  • 26. BayesCπ Analysis of human mini-exome sequencing data using a Bayesian hierarchical mixture model: Genetic Analysis Workshop 17 Bueno Filho JS1,2∗ , Morota G1∗ , Tran QT3 , Maenner MJ4 , Vera-Cala LM4,5 , Engelman CD4§ , and Meyers KJ4§ Department of Dairy Science, University of Wisconsin-Madison, USA Departamento de Ciˆncias Exatas, Universidade Federal de Lavras, Brasil e 3 Department of Statistics, University of Wisconsin-Madison, USA 4 Department of Population Health Sciences, University of Wisconsin-Madison, USA 5 Departamento de Salud Publica, Universidad Industrial de Santander, Colombia 1 2 ∗ § Contributed equally to this work Corresponding author Email addresses: JSB: jssbueno@dex.ufla.br Figure GM: morota@wisc.edu QTT: tran@stat.wisc.edu MJM: maenner@waisman.wisc.edu LMV: veracala@wisc.edu CDE: cengelman@wisc.edu KJM: kjmeyers2@wisc.edu 8: GAW17 26 / 32
  • 27. Allele Frequency of the Top Marker 0.8 0.6 0.4 Allele Frequency Original 0 5 10 15 20 25 30 25 30 Time 0.15 0.00 −0.15 Allele Frequency Detrended 5 10 15 20 Time Figure 9: Time plots of allele frequencies. Top: Original series. Bottom: Smoothed by taking the first order difference. 27 / 32
  • 28. Autocorrelation and Partial Autocorrelation ARIMA(1,1,1)? Original series 0.2 −0.4 −0.2 0.0 Partial ACF 0.4 0.0 ACF 0.8 0.4 Original series 0 2 4 6 8 10 12 14 2 4 6 8 10 12 First order difference series 14 Lag First order ifference series 0.2 0.0 Partial ACF −0.4 −0.2 0.4 0.0 −0.4 ACF 0.8 0.4 Lag 0 2 4 6 8 Lag 10 12 14 2 4 6 8 10 12 14 Lag Figure 10: ACF and PACF 28 / 32
  • 29. Model Selection Table 1: Comparison of several competitive models Model ARIMA (1,0,0) ARIMA (0,1,0) ARIMA (0,0,1) AIC -51.56 -49.38 -46.41 Model ARIMA (1,1,0) ARIMA (1,0,1) ARIMA (1,1,1) AIC -52.47 -51.13 -51.02 ARIMA(1,1,0) Xt = 0.635Xt −1 + t 29 / 32
  • 30. Advanced Models Time dependent variance • ARCH (Autoregressive Conditional Heteroskedasticity) • GARCH (Generalized Autoregressive Conditional Heteroskedasticity) Multivariate • VARMA (Vector Autoregression Moving Average) • BVARMA (Bayesian Vector Autoregression Moving Average) 30 / 32
  • 31. Intersection of Mathematics and Statistics Under certain condition GARCH(1,1) ≈ Diffusion Model! 31 / 32