Population Genetics of Gene Function

Popula'on
gene'cs
of
gene
func'on

Ignacio
Gallo

Glasgow,

December
2012

Mo'va'on

“Molecular
signatures
of
natural
selec0on”,
Nielsen
2005:

“inferences
regarding
the
paAerns
and
distribu'on
of
selec'on
in
genes
and

genomes
may
provide
important
func'onal
informa'on”

Wikipedia
entry
on
“sta0s0cal
thermodynamics”:

“The
goal
of
sta's'cal
thermodynamics
is
to
understand
and
to
interpret
the

measurable
macroscopic
proper'es
of
materials
in
terms
of
the
proper'es
of

their
cons'tuent
par'cles
and
the
interac'ons
between
them”

Mo'va'on

The
func'onal
importance
of
a
gene'c
sequence
can
be
inferred
by
its

popula'on
distribu'on
(for
example
from
its
degree
of
conserva'on).

Can
anything
more
be
said
about
the
gene’s
speciﬁc
func'on
(survival,

reproduc'on,
etc)?

distribu)on
of
genes

gene
func)on

macroscopic
proper)es

of
materials

proper)es
of
their

cons)tuent
par)cles

(size,
speed,
etc)

Moran
model:

Model
varia)on:

birth
death

change
in
frequency
for
a
given
phenotype

€
p−
€
p+
€
q−
€
q+
€
1− p−
− q−
€
p+
€
q+
We
have
two
phenotypes,
P1
and
P2

€
€
p−
, p+
: death and birth probabilities for P1
q−
, q+
If

p - + q - < 1

in
some
intervals
nothing
happens.

If
a
death
happens,
a
birth
happens
instantaneously
(“musical
chairs”
process).

€
p−
€
p+
€
q−
€
q+
€
1− p−
− q−
€
p+
€
q+
We
have
two
phenotypes,
P1
and
P2

€
€
p−
, p+
q−
, q+
If

p - + q - < 1

in
some
intervals
nothing
happens.

If
a
death
happens,
a
birth
happens
instantaneously
(“musical
chairs”
process).

loop

€
p−
€
p+
€
q−
€
q+
€
1− p−
− q−
€
p+
€
q+
We
have
two
phenotypes,
P1
and
P2

€
€
p−
, p+
q−
, q+
If

p - + q - < 1

in
some
intervals
nothing
happens.

If
a
death
happens,
a
birth
happens
instantaneously
(“musical
chairs”
process).

loop

no
loop

€
p−
€
p+
€
q−
€
q+
€
1− p−
− q−
€
p+
€
q+
We
differen'ate
the
phenotypes’
reproduc've
fitness
and
life'mes
independently,

and
consider
reproduc)on
and
survival
as
two
different
func'ons.

€
W1 : offspring for type 1,
W2 : offspring for type 2,
⎧
⎨
⎩
T1 : average lifespan for type 1,
T2 : average lifespan for type 2.
⎧
⎨
⎩
reproduc,on

survival

u : mutation probability
and
we
are
interested
in
the
equilibrium
distribu'on
of
a
process
with
symmetric

reversible
muta)on
for
haploid
individuals

q+
=
W1
T1
u x +
W2
T2
(1− u)(1− x)
W1
T1
x +
W2
T2
(1− x)
.
€
q−
=
1− x
T2
.
p+
=
W1
T1
(1− u)x +
W2
T2
u (1− x)
W1
T1
x +
W2
T2
(1− x)
,
€
p−
=
x
T1
,
Death
probabili)es:

Birth
probabili)es:

€
p−
€
p+
€
q−
€
q+
€
1− p−
− q−
€
p+
€
q+
Variable
x
is
the
frequency
of
phenotype
P1
(so
frequency
of
P2
is
1 - x )

q+
=
W1
T1
u x +
W2
T2
(1− u)(1− x)
W1
T1
x +
W2
T2
(1− x)
.
€
q−
=
1− x
T2
.
p+
=
W1
T1
(1− u)x +
W2
T2
u (1− x)
W1
T1
x +
W2
T2
(1− x)
,
€
p−
=
x
T1
,
Death
probabili)es:

Birth
probabili)es:

€
p−
€
p+
€
q−
€
q+
€
1− p−
− q−
€
p+
€
q+
€
⇒ p−
+ q−
< 1
Variable
x
is
the
frequency
of
phenotype
P1
(so
frequency
of
P2
is
1 - x )

€
Nu N → ∞⎯ →⎯⎯ θ,
The
model
therefore
depends
on:

To
get
a
non
trivial
distribu'on
the
following
asympto'c
constraints

are
imposed
on
the
parameters:

€
θ (mutation), s (reproduction), λ (survival).€
For notational convenience we also define λ =
T1
T2
.
€
N
W1
W2
−1
⎛
⎝
⎜
⎞
⎠
⎟ u→ 0
⎯ →⎯⎯ s.
Asympto)c
parameters

€
M = E[Xt +1 − Xt ] =
1
NT1
⋅
θ λ2
(1− x)2
+ λ s x(1− x) −θ x2
x + λ (1− x)
,
V = E (Xt +1 − Xt )2
[ ] =
2 λ
NT1
⋅
x (1− x)
x + λ (1− x)
.

€
€
φ(x)= C⋅
1
V
⋅ exp 2
M
V
dx∫
⎧
⎨
⎩
⎫
⎬
⎭
,
€
φ(x)= Ceα x
xλ θ −1
(1− x)
θ
λ
−1
x + λ(1− x){ },
€
α = s +θ
1
λ
− λ
⎛
⎝
⎜
⎞
⎠
⎟.
which
explicitly
gives

where

The
Wright
equilibrium
distribu'on
for
large
N
is

Equilibrium
distribu)on

Typical
shapes
for
equilibrium
distribu)ons
rela've
frequency
of
“blue”
phenotypes

probability
density

€
low mutation
rela've
frequency
of
“blue”
phenotypes

probability
density

€
high mutation
rela've
frequency
of
“blue”
phenotypes

probability
density

€
mutation probability close to
1
N

“u”:

probability
of
muta'on
per
site

sta'onary
points

€
λ =
T1
T2
=1 (life'mes
are
equal
for
the
two

phenotypes)

random
driR

muta'on/selec'on
balance

€
1
N
Sta)onary
points

“u”:

probability
of
muta'on
per
site

sta'onary
points

€
λ =
T1
T2
=1 (life'mes
are
equal
for
the
two

phenotypes)

random
driR

muta'on/selec'on
balance

€
1
N
Sta)onary
points
sta'onary
points

€
λ =
T1
T2
=
3
2
“u”:

probability
of
muta'on
per
site

€
1
N
random
driR

muta'on/selec'on
balance

“u”:

probability
of
muta'on
per
site

sta'onary
points

€
λ =
T1
T2
=1 (life'mes
are
equal
for
the
two

phenotypes)

random
driR

muta'on/selec'on
balance

€
1
N
Sta)onary
points
random
driR

muta'on/selec'on
balance

sta'onary
points

€
λ =
T1
T2
=
3
2
“u”:

probability
of
muta'on
per
site

€
1
N
€
λ =
T1
T2
=
3
2
€
1
N

Sta)onary
points
sta'onary
points

€
λ =
T1
T2
=
3
2
“u”:

probability
of
muta'on
per
site

€
1
N
random
driR

muta'on/selec'on
balance

€
1
N

The
model
includes
one
more
parameter
than
the
standard
seTng,

so
it’s

desirable
expand
the
number
of
independent
sta)s)cs.

This
can
be
done
considering
the
amount
of
synonymous
varia'on
included
in

each
of
our
two
phenotypes,
and
considering
it
neutral
(as
done
by
Nielsen
and

colleagues
for
various
types
of
models).

The
amount
of
synonymous
varia'on
can
be
quan'ﬁed
by
using
the
inbreeding

coeﬃcient
concept.

€
Inferring λ =
T1
T2
from population statistics

phenotypes
P1
and
P2

genotypes

genotypes

Distribu)on
of
a
gene
throughout
a
popula)on
(Kreitman
1983)

x = relative frequency of P1
F1 = inbreeding coefficient for P1
F2 = inbreeding coefficient for P2
Sta)s)cal
quan))es
P1

P2

€
F1 /x +θ λ F1 /x − F1( )+ (L −1) F1[ ]− 1/x ≈ 0
F2 /(1− x) +θ
1
λ
F2 /(1− x) − F2( )+ (L −1) F2
⎡
⎣⎢
⎤
⎦⎥ − 1/(1− x) ≈ 0
A
result
by
Kimura
and
Crow
gives
that
for
only
one
phenotype

€
F ≈
1
1+ 2θ
This
can
be
extended
to
the
case
of
two
phenotypes
to
give
two
equa'ons:

where
θ
is
the
rescaled
muta'on
rate.

0.5 1 1.5 20.75 1.25 1.75
0.5
1
1.5
2
.75
1.25
1.75
Real
Estimated
T1
= 2 T2
T1
= 1/2 T2
This
(hideous)
formula
can
be
used
to

derive
the
value
of

λ

from
combined
moments
of
quan''es

x ,

F1 ,

F2

es'mated
from
a
set
of
simulated

realiza'ons
of
the
process.

€
number of realizations per point =10000
€
running time = 5000 "generations"€
simulation parameters:
s = −5, θ = 7, N =1000, L (genotype length) = 40
and this relation leads to a quadratic equation that only admits one non-negative solution:
λ =
1
2Q1

(R − 1)(L − 1) +

(R − 1)2(L − 1)2 + 4RQ1Q2

. (4.5)
Figure 5 shows the result of using formula (4.5) to estimate λ, for a series series of
simulations where the real value of λ ranges from .5 (i.e. T1 = 1/2 T2) to 2 (i.e. T1 = 2 T2):
we see that the average values of such estimations are well aligned with the actual values.
The magnitude of the standard deviation for our estimations, on the other hand, is
considerable, especially in view of the fact the 10000 realisations of the process were used
to estimate each value of λ: it is clear that a substantial increase of efficiency will be
needed to make the theory relevant to actual empirical phenomena.
This practical consideration ought not to be allowed, however, to obfuscate the fact
that equation (4.5) provides a direct mathematical relation between combined moments
of the population quantities x, F1 and F2, and parameter λ = T1/T2, which arguably
contains information about the function of a genetic sequence.
5 Outlook
We have shown that the effect of differentiating the lifetimes of two phenotypes inde-
pendently from their fertility includes a qualitative change in the equilibrium state of a
wing auxiliary quantities
R =
F2
F1
·
1/x − F1/x
1/(1 − x) − F2/(1 − x)
,
Q1 =
F1/x
F1
− 1, Q2 =
F2/x
F2
− 1.
n terms of these quantities, the equation for λ takes the following form
R = λ
λQ1 + L − 1
Q2 + λ(L − 1)
,
his relation leads to a quadratic equation that only admits one non-negative solution:
λ =
1
2Q1

(R − 1)(L − 1) +

(R − 1)2(L − 1)2 + 4RQ1Q2

. (4.5)
igure 5 shows the result of using formula (4.5) to estimate λ, for a series series of
ations where the real value of λ ranges from .5 (i.e. T1 = 1/2 T2) to 2 (i.e. T1 = 2 T2):
e that the average values of such estimations are well aligned with the actual values.
he magnitude of the standard deviation for our estimations, on the other hand, is
derable, especially in view of the fact the 10000 realisations of the process were used
timate each value of λ: it is clear that a substantial increase of efficiency will be
ed to make the theory relevant to actual empirical phenomena.
his practical consideration ought not to be allowed, however, to obfuscate the fact
following auxiliary quantities
R =
F2
F1
·
1/x − F1/x
1/(1 − x) − F2/(1 − x)
,
Q1 =
F1/x
F1
− 1, Q2 =
F2/x
F2
− 1.
In terms of these quantities, the equation for λ takes the following form
R = λ
λQ1 + L − 1
Q2 + λ(L − 1)
,
and this relation leads to a quadratic equation that only admits one non-negative solution
λ =
1
2Q1

(R − 1)(L − 1) +

(R − 1)2(L − 1)2 + 4RQ1Q2

. (4.5
Figure 5 shows the result of using formula (4.5) to estimate λ, for a series series o
simulations where the real value of λ ranges from .5 (i.e. T1 = 1/2 T2) to 2 (i.e. T1 = 2 T2)
we see that the average values of such estimations are well aligned with the actual values
The magnitude of the standard deviation for our estimations, on the other hand, i
considerable, especially in view of the fact the 10000 realisations of the process were used
to estimate each value of λ: it is clear that a substantial increase of efficiency will b
needed to make the theory relevant to actual empirical phenomena.
This practical consideration ought not to be allowed, however, to obfuscate the fac
that equation (4.5) provides a direct mathematical relation between combined moment

Summary

•  Playing
with
details
of
process
is
fun

•  In
principle
“func'onal”
parameter

λ
=
T1 / T2
can
be
es'mated

from
“popula'on
observables”

x ,

F1 ,

F2

“Disclaimer”

…there
is
no
such
thing
as
the
“func'on
of
a
gene”,
like
there
is
no
such
thing

as
the
“meaning
of
a
word”.

For
example,
the
word
“Gene”
can
mean:

…but
dic'onaries
exist
(and
they
are
obsolete
(but
they’re
a
start
(or
maybe
not))).

or

Population Genetics of Gene Function

Recommended

Recommended

More Related Content

Similar to Population Genetics of Gene Function

Similar to Population Genetics of Gene Function (20)

More from Ignacio Gallo

More from Ignacio Gallo (6)

Recently uploaded

Recently uploaded (20)

Population Genetics of Gene Function