1.
When
Bayes
meets
Darwin:
a
journey
in
popula6on
genomics
michael.blum@imag.fr
Laboratoire
TIMC-‐IMAG,
Grenoble
2. In
the
“descent
of
man”,
Darwin
concluded
that
the
visual
differences
between
human
popula6on
were
not
adap6ve
to
any
significant
degree
[…]
“Natural
selec,on
has
almost
become
irrelevant
in
human
evolu,on.
There's
been
no
biological
change
in
humans
in
40,000
or
50,000
years”
Stephen
J.
Gould
3. But
here
is
a
counter-‐example
• Tibetan
popula6ons
got
adapted
to
their
high-‐al6tude
and
low-‐oxygen
environment
thanks
to
increased
respiratory
rate
and
increased
blood
flow.
• These
traits
are
transmiTed
from
genera6on
to
genera6on.
• Tibetan
plateau
has
been
inhabited
since
~
20,000
years.
4. Local
adapta6on
• Human
adapta6on
to
high-‐al6tude
is
an
instance
of
local
adapta6on.
• Understanding
how
individuals
adapt
to
their
local
environment
is
central
in
biology.
Plants
adapt
to
their
environment,
bacteria
adapt
to
an6bio6cs…
• Defini6on
of
local
adapta6on:
greater
fitness
(a
measure
of
reproduc6ve
fitness)
of
individuals
in
their
local
habitats
due
to
natural
selec6on.
How
to
find
genomic
regions
involved
in
local
adapta6on?
6. Single
Nucleo6de
Polymorphism
(SNP)
Indiv
1
....ACCCG……….
....AACCG……….
Number
of
copy
1
0
Indiv
2
….ACCCT……….
….ACCCT……….
Number
of
copy
0
2
-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐
• 3
billion
base
pairs
in
the
human
genome
• Commercial
SNP
chips,
100€
for
500,000
SNPs
• dbSNP
>106
SNPS
7. Single
Nucleo6de
Polymorphism
(SNP)
Data
matrix
Y
Locus
1
Locus
2
Locus
3
Indiv
1
1
0
2
Indiv
2
0
2
0
Indiv
3
0
0
0
Indiv
4
0
1
1
Indiv
5
1
1
1
8. Main
principle
of
popula6on
genomics
• Genome-‐wide
paTerns
are
influenced
by
neutral
processes.
Migra6on,
admixture,
expansion
• Genes
involved
in
local
adapta6on
are
outliers.
15.
Singular
Value
Decomposi6on
(SVD)
viewpoint
of
PCA
In
matrix
nota6on,
we
have
Y = UV,
where
Y
is
the
genotype
(n,p)
matrix,
U
is
the
(n,K)
score
matrix
and
V
is
the
loadings
(K,p)
matrix.
Varia6ons
around
SVD
in
machine
learning
matrix
factoriza,on,
low-‐rank
approxima,on,
probabilis,c
PCA,
factor
analysis,…
16.
Singular
Value
Decomposi6on
(SVD)
viewpoint
of
PCA
An
op6mal
approxima6on
of
rank
K
for
the
matrix
of
genotypes
Y
K
Yi = ∑ u V
k
i
k
k=1
Yi:
Genotype
of
the
ith
individual
(0,1,1,2,0,0,…..)
k,1
k,2
k,3
Vk:
vector
of
loadings
(v , v , v ,...)
of
the
same
length
as
Yi
17. Bayesian
principal
component
analysis
• A
probabilis6c
version
of
PCA
Tipping
and
Bishop
1999
K
Yi = ∑ u V + εi .
k
i
k
k=1
• The
variance-‐infla6on
model
for
outlier
detec6on
Box
and
Tiao
1968
p(v j ) = (1− π ) Ν(0,σ 2 ) + π Ν(0,c 2σ 2 ),
where
π
is
the
genome-‐wide
outlier
probability,
and
the
prior
for
c2
is
uniform(1,c2max).
18. Accoun6ng
for
local
correla6on
in
the
genome
Local
correla6on
because
of
recombina6on
Ising
model
(Outlier
Zj=1,
non-‐outlier
Zj=0)
P(Z j = 1) ∝ π exp(β.∑ Z k ),
where
β>0
is
an
hyperparameter.
k ~j
19. A
hierarchical
Bayesian
model
Gibbs
sampler
for
sampling
the
posterior
π
β
σ
Z
K
U
V
Y
c
σ0
cmax
21. Bayesian
scores
for
detec6ng
outliers
• Bayes
factors:
a
Bayesian
alterna6ve
to
P-‐values
BF = P(Y j outlier) / P(Y j non − outlier)
• Posterior
odds
P(outlier Y j ) / P(non − outlier Y j ) = prior.odds * BF
• For
any
list
of
outlier
SNPs,
a
false
discovery
rate
can
be
es6mated
based
on
posterior
odds.
22. Ex
1:
a
simula6on
study
in
a
divergence
model
Neutral
divergence
(ms)
Divergence
with
selec6on
(SimuPOP)
4%
out
of
10,000
SNPs
under
selec6on
23. Other
methods
for
genome
scan
of
local
adapta6on
• Fst
A
measure
of
differen6a6on
between
popula6ons
• BayeScan
(Foll
and
Gaggios
2008)
• Both
methods
assume
(implicitely
or
explicitely)
a
mechanis6c
model
of
instantaneous
divergence
33. Enrichment
analysis
30
Are
PC2
outliers
enriched
for
genes
involved
in
immunity?
Africa
Americas
Oceania
10
Asia
0
PC2
20
Middle-East
Europe
East Asia
0
10
PC1
20
34. Big
data
What
can
you
do
with
millions
of
SNPs?
Scalable
Bayesian
computa6on?
Standard
PCA
and
permuta6on
tests.
35. A
George
Box
(1919-‐2013)
story
to
conclude
• Box
wanted
to
write
a
paper
with
Cox
because
having
a
Box
and
Cox
paper
would
be
fun.
• They
decided
to
write
a
paper
on
transforma6on.
• One
author
wrote
the
Bayesian
version
and
the
other
one
wrote
the
maximum
likelihood
version.
We
do
not
know
who
wrote
what.
• At
the
end,
it
did
not
make
much
prac6cal
difference.