2. Mo'va'on
“Molecular
signatures
of
natural
selec0on”,
Nielsen
2005:
“inferences
regarding
the
paAerns
and
distribu'on
of
selec'on
in
genes
and
genomes
may
provide
important
func'onal
informa'on”
Wikipedia
entry
on
“sta0s0cal
thermodynamics”:
“The
goal
of
sta's'cal
thermodynamics
is
to
understand
and
to
interpret
the
measurable
macroscopic
proper'es
of
materials
in
terms
of
the
proper'es
of
their
cons'tuent
par'cles
and
the
interac'ons
between
them”
3. Mo'va'on
The
func'onal
importance
of
a
gene'c
sequence
can
be
inferred
by
its
popula'on
distribu'on
(for
example
from
its
degree
of
conserva'on).
Can
anything
more
be
said
about
the
gene’s
specific
func'on
(survival,
reproduc'on,
etc)?
distribu)on
of
genes
gene
func)on
macroscopic
proper)es
of
materials
proper)es
of
their
cons)tuent
par)cles
(size,
speed,
etc)
4. Moran
model:
Model
varia)on:
birth
death
change
in
frequency
for
a
given
phenotype
5. €
p−
€
p+
€
q−
€
q+
€
1− p−
− q−
€
p+
€
q+
We
have
two
phenotypes,
P1
and
P2
€
€
p−
, p+
: death and birth probabilities for P1
q−
, q+
: death and birth probabilities for P2
If
p - + q - < 1
in
some
intervals
nothing
happens.
If
a
death
happens,
a
birth
happens
instantaneously
(“musical
chairs”
process).
6. €
p−
€
p+
€
q−
€
q+
€
1− p−
− q−
€
p+
€
q+
We
have
two
phenotypes,
P1
and
P2
€
€
p−
, p+
: death and birth probabilities for P1
q−
, q+
: death and birth probabilities for P2
If
p - + q - < 1
in
some
intervals
nothing
happens.
If
a
death
happens,
a
birth
happens
instantaneously
(“musical
chairs”
process).
loop
7. €
p−
€
p+
€
q−
€
q+
€
1− p−
− q−
€
p+
€
q+
We
have
two
phenotypes,
P1
and
P2
€
€
p−
, p+
: death and birth probabilities for P1
q−
, q+
: death and birth probabilities for P2
If
p - + q - < 1
in
some
intervals
nothing
happens.
If
a
death
happens,
a
birth
happens
instantaneously
(“musical
chairs”
process).
loop
no
loop
8. €
p−
€
p+
€
q−
€
q+
€
1− p−
− q−
€
p+
€
q+
We
have
two
phenotypes,
P1
and
P2
€
€
p−
, p+
: death and birth probabilities for P1
q−
, q+
: death and birth probabilities for P2
If
p - + q - < 1
in
some
intervals
nothing
happens.
If
a
death
happens,
a
birth
happens
instantaneously
(“musical
chairs”
process).
loop
no
loop
9. €
p−
€
p+
€
q−
€
q+
€
1− p−
− q−
€
p+
€
q+
We
differen'ate
the
phenotypes’
reproduc've
fitness
and
life'mes
independently,
and
consider
reproduc)on
and
survival
as
two
different
func'ons.
€
W1 : offspring for type 1,
W2 : offspring for type 2,
⎧
⎨
⎩
T1 : average lifespan for type 1,
T2 : average lifespan for type 2.
⎧
⎨
⎩
reproduc,on
survival
u : mutation probability
and
we
are
interested
in
the
equilibrium
distribu'on
of
a
process
with
symmetric
reversible
muta)on
for
haploid
individuals
10. q+
=
W1
T1
u x +
W2
T2
(1− u)(1− x)
W1
T1
x +
W2
T2
(1− x)
.
€
q−
=
1− x
T2
.
p+
=
W1
T1
(1− u)x +
W2
T2
u (1− x)
W1
T1
x +
W2
T2
(1− x)
,
€
p−
=
x
T1
,
Death
probabili)es:
Birth
probabili)es:
€
p−
€
p+
€
q−
€
q+
€
1− p−
− q−
€
p+
€
q+
Variable
x
is
the
frequency
of
phenotype
P1
(so
frequency
of
P2
is
1 - x )
11. q+
=
W1
T1
u x +
W2
T2
(1− u)(1− x)
W1
T1
x +
W2
T2
(1− x)
.
€
q−
=
1− x
T2
.
p+
=
W1
T1
(1− u)x +
W2
T2
u (1− x)
W1
T1
x +
W2
T2
(1− x)
,
€
p−
=
x
T1
,
Death
probabili)es:
Birth
probabili)es:
€
p−
€
p+
€
q−
€
q+
€
1− p−
− q−
€
p+
€
q+
€
⇒ p−
+ q−
< 1
Variable
x
is
the
frequency
of
phenotype
P1
(so
frequency
of
P2
is
1 - x )
12. €
Nu N → ∞⎯ →⎯⎯ θ,
The
model
therefore
depends
on:
To
get
a
non
trivial
distribu'on
the
following
asympto'c
constraints
are
imposed
on
the
parameters:
€
θ (mutation), s (reproduction), λ (survival).€
For notational convenience we also define λ =
T1
T2
.
€
N
W1
W2
−1
⎛
⎝
⎜
⎞
⎠
⎟ u→ 0
⎯ →⎯⎯ s.
Asympto)c
parameters
13. €
Nu N → ∞⎯ →⎯⎯ θ,
The
model
therefore
depends
on:
To
get
a
non
trivial
distribu'on
the
following
asympto'c
constraints
are
imposed
on
the
parameters:
€
θ (mutation), s (reproduction), λ (survival).€
For notational convenience we also define λ =
T1
T2
.
€
N
W1
W2
−1
⎛
⎝
⎜
⎞
⎠
⎟ u→ 0
⎯ →⎯⎯ s.
Asympto)c
parameters
14. €
M = E[Xt +1 − Xt ] =
1
NT1
⋅
θ λ2
(1− x)2
+ λ s x(1− x) −θ x2
x + λ (1− x)
,
V = E (Xt +1 − Xt )2
[ ] =
2 λ
NT1
⋅
x (1− x)
x + λ (1− x)
.
15. €
M = E[Xt +1 − Xt ] =
1
NT1
⋅
θ λ2
(1− x)2
+ λ s x(1− x) −θ x2
x + λ (1− x)
,
V = E (Xt +1 − Xt )2
[ ] =
2 λ
NT1
⋅
x (1− x)
x + λ (1− x)
.
16. €
€
φ(x)= C⋅
1
V
⋅ exp 2
M
V
dx∫
⎧
⎨
⎩
⎫
⎬
⎭
,
€
φ(x)= Ceα x
xλ θ −1
(1− x)
θ
λ
−1
x + λ(1− x){ },
€
α = s +θ
1
λ
− λ
⎛
⎝
⎜
⎞
⎠
⎟.
which
explicitly
gives
where
The
Wright
equilibrium
distribu'on
for
large
N
is
Equilibrium
distribu)on
17. €
€
φ(x)= C⋅
1
V
⋅ exp 2
M
V
dx∫
⎧
⎨
⎩
⎫
⎬
⎭
,
€
φ(x)= Ceα x
xλ θ −1
(1− x)
θ
λ
−1
x + λ(1− x){ },
€
α = s +θ
1
λ
− λ
⎛
⎝
⎜
⎞
⎠
⎟.
which
explicitly
gives
where
The
Wright
equilibrium
distribu'on
for
large
N
is
Equilibrium
distribu)on
18. Typical
shapes
for
equilibrium
distribu)ons
rela've
frequency
of
“blue”
phenotypes
probability
density
€
low mutation
rela've
frequency
of
“blue”
phenotypes
probability
density
€
high mutation
rela've
frequency
of
“blue”
phenotypes
probability
density
€
mutation probability close to
1
N
19. Typical
shapes
for
equilibrium
distribu)ons
rela've
frequency
of
“blue”
phenotypes
probability
density
€
low mutation
rela've
frequency
of
“blue”
phenotypes
probability
density
€
high mutation
rela've
frequency
of
“blue”
phenotypes
probability
density
€
mutation probability close to
1
N
20. “u”:
probability
of
muta'on
per
site
sta'onary
points
€
λ =
T1
T2
=1 (life'mes
are
equal
for
the
two
phenotypes)
random
driR
muta'on/selec'on
balance
€
1
N
Sta)onary
points
21. “u”:
probability
of
muta'on
per
site
sta'onary
points
€
λ =
T1
T2
=1 (life'mes
are
equal
for
the
two
phenotypes)
random
driR
muta'on/selec'on
balance
€
1
N
Sta)onary
points
sta'onary
points
€
λ =
T1
T2
=
3
2
“u”:
probability
of
muta'on
per
site
€
1
N
random
driR
muta'on/selec'on
balance
22. “u”:
probability
of
muta'on
per
site
sta'onary
points
€
λ =
T1
T2
=1 (life'mes
are
equal
for
the
two
phenotypes)
random
driR
muta'on/selec'on
balance
€
1
N
Sta)onary
points
random
driR
muta'on/selec'on
balance
sta'onary
points
€
λ =
T1
T2
=
3
2
“u”:
probability
of
muta'on
per
site
€
1
N
€
λ =
T1
T2
=
3
2
€
1
N
23. Sta)onary
points
sta'onary
points
€
λ =
T1
T2
=
3
2
“u”:
probability
of
muta'on
per
site
€
1
N
random
driR
muta'on/selec'on
balance
€
1
N
24. Typical
shapes
for
equilibrium
distribu)ons
rela've
frequency
of
“blue”
phenotypes
probability
density
€
low mutation
rela've
frequency
of
“blue”
phenotypes
probability
density
€
high mutation
rela've
frequency
of
“blue”
phenotypes
probability
density
€
mutation probability close to
1
N
25. The
model
includes
one
more
parameter
than
the
standard
seTng,
so
it’s
desirable
expand
the
number
of
independent
sta)s)cs.
This
can
be
done
considering
the
amount
of
synonymous
varia'on
included
in
each
of
our
two
phenotypes,
and
considering
it
neutral
(as
done
by
Nielsen
and
colleagues
for
various
types
of
models).
The
amount
of
synonymous
varia'on
can
be
quan'fied
by
using
the
inbreeding
coefficient
concept.
€
Inferring λ =
T1
T2
from population statistics
26. phenotypes
P1
and
P2
genotypes
genotypes
Distribu)on
of
a
gene
throughout
a
popula)on
(Kreitman
1983)
27. x = relative frequency of P1
F1 = inbreeding coefficient for P1
F2 = inbreeding coefficient for P2
Sta)s)cal
quan))es
P1
P2
28. €
F1 /x +θ λ F1 /x − F1( )+ (L −1) F1[ ]− 1/x ≈ 0
F2 /(1− x) +θ
1
λ
F2 /(1− x) − F2( )+ (L −1) F2
⎡
⎣⎢
⎤
⎦⎥ − 1/(1− x) ≈ 0
A
result
by
Kimura
and
Crow
gives
that
for
only
one
phenotype
€
F ≈
1
1+ 2θ
This
can
be
extended
to
the
case
of
two
phenotypes
to
give
two
equa'ons:
where
θ
is
the
rescaled
muta'on
rate.
29. 0.5 1 1.5 20.75 1.25 1.75
0.5
1
1.5
2
.75
1.25
1.75
Real
Estimated
T1
= 2 T2
T1
= 1/2 T2
This
(hideous)
formula
can
be
used
to
derive
the
value
of
λ
from
combined
moments
of
quan''es
x ,
F1 ,
F2
es'mated
from
a
set
of
simulated
realiza'ons
of
the
process.
€
number of realizations per point =10000
€
running time = 5000 "generations"€
simulation parameters:
s = −5, θ = 7, N =1000, L (genotype length) = 40
and this relation leads to a quadratic equation that only admits one non-negative solution:
λ =
1
2Q1
(R − 1)(L − 1) +
(R − 1)2(L − 1)2 + 4RQ1Q2
. (4.5)
Figure 5 shows the result of using formula (4.5) to estimate λ, for a series series of
simulations where the real value of λ ranges from .5 (i.e. T1 = 1/2 T2) to 2 (i.e. T1 = 2 T2):
we see that the average values of such estimations are well aligned with the actual values.
The magnitude of the standard deviation for our estimations, on the other hand, is
considerable, especially in view of the fact the 10000 realisations of the process were used
to estimate each value of λ: it is clear that a substantial increase of efficiency will be
needed to make the theory relevant to actual empirical phenomena.
This practical consideration ought not to be allowed, however, to obfuscate the fact
that equation (4.5) provides a direct mathematical relation between combined moments
of the population quantities x, F1 and F2, and parameter λ = T1/T2, which arguably
contains information about the function of a genetic sequence.
5 Outlook
We have shown that the effect of differentiating the lifetimes of two phenotypes inde-
pendently from their fertility includes a qualitative change in the equilibrium state of a
wing auxiliary quantities
R =
F2
F1
·
1/x − F1/x
1/(1 − x) − F2/(1 − x)
,
Q1 =
F1/x
F1
− 1, Q2 =
F2/x
F2
− 1.
n terms of these quantities, the equation for λ takes the following form
R = λ
λQ1 + L − 1
Q2 + λ(L − 1)
,
his relation leads to a quadratic equation that only admits one non-negative solution:
λ =
1
2Q1
(R − 1)(L − 1) +
(R − 1)2(L − 1)2 + 4RQ1Q2
. (4.5)
igure 5 shows the result of using formula (4.5) to estimate λ, for a series series of
ations where the real value of λ ranges from .5 (i.e. T1 = 1/2 T2) to 2 (i.e. T1 = 2 T2):
e that the average values of such estimations are well aligned with the actual values.
he magnitude of the standard deviation for our estimations, on the other hand, is
derable, especially in view of the fact the 10000 realisations of the process were used
timate each value of λ: it is clear that a substantial increase of efficiency will be
ed to make the theory relevant to actual empirical phenomena.
his practical consideration ought not to be allowed, however, to obfuscate the fact
following auxiliary quantities
R =
F2
F1
·
1/x − F1/x
1/(1 − x) − F2/(1 − x)
,
Q1 =
F1/x
F1
− 1, Q2 =
F2/x
F2
− 1.
In terms of these quantities, the equation for λ takes the following form
R = λ
λQ1 + L − 1
Q2 + λ(L − 1)
,
and this relation leads to a quadratic equation that only admits one non-negative solution
λ =
1
2Q1
(R − 1)(L − 1) +
(R − 1)2(L − 1)2 + 4RQ1Q2
. (4.5
Figure 5 shows the result of using formula (4.5) to estimate λ, for a series series o
simulations where the real value of λ ranges from .5 (i.e. T1 = 1/2 T2) to 2 (i.e. T1 = 2 T2)
we see that the average values of such estimations are well aligned with the actual values
The magnitude of the standard deviation for our estimations, on the other hand, i
considerable, especially in view of the fact the 10000 realisations of the process were used
to estimate each value of λ: it is clear that a substantial increase of efficiency will b
needed to make the theory relevant to actual empirical phenomena.
This practical consideration ought not to be allowed, however, to obfuscate the fac
that equation (4.5) provides a direct mathematical relation between combined moment
30. Summary
• Playing
with
details
of
process
is
fun
• In
principle
“func'onal”
parameter
λ
=
T1 / T2
can
be
es'mated
from
“popula'on
observables”
x ,
F1 ,
F2
32. “Disclaimer”
…there
is
no
such
thing
as
the
“func'on
of
a
gene”,
like
there
is
no
such
thing
as
the
“meaning
of
a
word”.
For
example,
the
word
“Gene”
can
mean:
…but
dic'onaries
exist
(and
they
are
obsolete
(but
they’re
a
start
(or
maybe
not))).
or