variational bayes in biophysics

bio+stats vbem/networks hierarchical
variational and hierarchical modeling
for biological data
chris wiggins
columbia
april 23, 2012
chris.wiggins@columbia.edu 4/23/12
Chris Wiggins
• APAM: Department of Applied Physics and Applied Mathematics;
• C2B2: Center for Computational Biology and Bioinformatics;
• CISB: Columbia University Initiative in Systems Biology
• ISDE: Institute for Data Sciences and Engineering
Columbia University
September 28, 2012

bio+stats vbem/networks hierarchical biological challenges inference model selection
thanks. . .
- jake hofman (vbmod,vbfret)
- jonathan bronson (vbfret)
- jan-willem van de meent (hfret)
- ruben gonzalez (vbfret, hfret)
for more info:
- vbfret.sourceforge.net
- vbmod.sourceforge.net
- hfret.sourceforge.net (soon)
BMC bioinformatics, 2010;
PNAS 2009;
Biophysical Journal 2009;

bio+stats vbem/networks hierarchical
1 biology and statistics
genomics
generative modeling
2 variational/biological networks
variational Bayesian expectation maximization
inference
model selection
3 hierarchical/time series
biological challenges
inference
model selection

bio+stats vbem/networks hierarchical genomics generative modeling
biology and statistics:

genomics:
plos comp bio `10, nyas `07, bmc bioinfo `06a, bmc bioinfo `06b,
bioinfo `04, regulatory genomics 04, IEEE `05; NIPS (MLCB`03, `06 )

generative modeling:
bmc bioinfo '10, PNAS`09, biophys
j`09, PNAS`07, PNAS`06 NIPS
(MLCB '09, '10, '11) IEEE sig. proc.
`12; plos one `08,cell `07,prl `06, jcs
`06, biophys j `06
pami `08, prl `08, NIPS (MLCB08),
PNAS `05, bmc bioinfo `04

bio+stats vbem/networks hierarchical vbem inference model selection
variational/biological networks:
- variational bayesian expectation maximization
- inference
- model selection

introduction formulation results extensions motivation history
introduction:
chris.wiggins@columbia.edu 22.2.12 vbmod.sourceforge.net
Hartwell, Hopﬁeld, Leibler and Murray
NATURE|VOL 402 | SUPP | 2
DECEMBER 1999 | www.nature.com

motivation:
community detection in networks
social networks
biological networks
problem: over-ﬁtting/resolution limit

history:
by community
math/cs: spectral methods (Fiedler ’74, Shi + Malik ’00)
math/cs: clustering generally (Taskar, Koller, Getoor)
physics: modularity
common thread: test w/ stochastic block model (’76, ’83)
ergo: use as inference tool (Hastings 0604429,
Newman+Liecht 061148)

introduction formulation results extensions generative model max likelihood max evidence algo
formulation:
generative model
maximum likelihood
maximum evidence
complexity control. . .
variational/mft. . .
algorithm
in physics: “test hamiltonian”
in ML “variational bayesian methods” (Jordan, Mackay)

generative model:
foreach node roll K-sided die with bias π to choose
zi {1, . . . , K}
foreach edge ﬂip coin with bias ϑ+ if zi = zj , else ϑ−
draw edge if coin lands heads up
Stochastic block models (Holland, Laskey, Leinhardt 1983; Wang and Wong, 1987)
i≠j
zi zj
Aij
π
θ

generative model. . . (bis)
Die rolling, coin ﬂipping, and priors: where counts are:
non-edges within
modules
edges within
modules
edges between
modules
non-edges
between modules
nodes in each
module

max likelihood:
H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) =
i,j
(JLAij JG) zi,zj +
K
µ=1
hµ
N
i=1
zi,µ
JG ⇥ ln ⇥c/⇥d
JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG
hµ ⇥ ln µ
Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2006)
•Die rolling, coin ﬂipping <-> inﬁnite-range spin-glass Potts model:

max evidence:
Increasing complexity

max evidence:
http://research.microsoft.com/~minka/statlearn/demo/

max evidence:

max evidence:
cf. “BIC” Schwartz, 1978

generative model:
foreach node roll K-sided die with bias π to choose
zi {1, . . . , K}
foreach edge ﬂip coin with bias ϑ+ if zi = zj , else ϑ−
draw edge if coin lands heads up
i≠j
zi zj
Aij
π
θ c
n

max likelihood:
•Infer distributions over spin assignments, coupling constants, and
chemical potentials and ﬁnd number of occupied spin states
JG ⇥ ln ⇥c/⇥d
JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG
hµ ⇥ ln µ
H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) =
i,j
(JLAij JG) zi,zj +
K
µ=1
hµ
N
i=1
zi,µ

max evidence:
H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) =
i,j
(JLAij JG) zi,zj +
K
µ=1
hµ
N
i=1
zi,µ
JG ⇥ ln ⇥c/⇥d
JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG
hµ ⇥ ln µ
p(A|K) =
⇥z
⇥
d⌦
⇥
d⌦⇥ p(A,⌦z,⌦⇥, ⌦) =
⇥z
⇥
d⌦
⇥
d⌦⇥ e H
p(⌦)p(⌦⇥)

max evidence:
Extends Newman (2004, 2006), Hastings (2006), Bornholdt & Reichardt (2004 & 2006)
H ⇥ ln p(A, ⌦z|⌦⇤, ⌦⇥) =
i,j
(JLAij JG) zi,zj +
K
µ=1
hµ
N
i=1
zi,µ
JG ⇥ ln ⇥c/⇥d
JL ⇥ ln(1 ⇥d)/(1 ⇥c) + JG
hµ ⇥ ln µ
p(A|K) =
⇥z
⇥
d⌦
⇥
d⌦⇥ p(A,⌦z,⌦⇥, ⌦) =
⇥z
⇥
d⌦
⇥
d⌦⇥ e H
p(⌦)p(⌦⇥)
Can do integrals,
but sum is
intractable, O(KN);
use mean-ﬁeld

max evidence:
• Gibbs’/Jensen’s inequality (log of expected value bounds expected value of log) for any distribution q
p(A|K) =
⇥z
⇥
d⌦
⇥
d⌦⇥ p(A,⌦z,⌦⇥, ⌦) =
⇥z
⇥
d⌦
⇥
d⌦⇥ e H
p(⌦)p(⌦⇥)
Variational Bayes (MacKay, Jordan, Ghahramani, Jaakola, Saul 1999; cf. Feynman 1972)

max evidence:
why would you do this? (A1):
Beal, 2003

max evidence:
Beal, 2003

max evidence:
• Gibbs’/Jensen’s inequality (log of expected value bounds expected value of log) for any distribution q
Variational Bayes (MacKay, Jordan, Ghahramani, Jaakola, Saul 1999; cf. Feynman 1972)
• F is a functional of q; ﬁnd approximation to posterior by optimizing approximation to
evidence
• Take q(z, π, θ)=q(z)q(π)q(θ); Qiμ is probability node i in module μ where expected counts
are:

algo:
where expected counts
are:

algo:

algo:
suggests hard limit in step 3; sparse in step 1

introduction formulation results extensions run time consistency good vs easy real data
results:
run time
consistency
required plot: good vs. easy
real data
karate
biology
american football

run time:
• Main loop runtime for 104 nodes in MATLAB ~30 seconds

consistency:

consistency:
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
2
4
6
8
10
12
θ
N=8, K=2, distribution after 2 iterations
p(θ+
)
p(θ
−
)

consistency:
• K=4?
• Automatic complexity control: probability of occupation for extraneous modules
goes to zero

consistency:
The “resolution limit” problem
1
2
3
4
5
57
6
7
8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
4041
42
43 44
45
46
47
48
49
50
51
52
53
54
55
56
58
59
60
1
2
3
4
5
57
6
7
8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
4041
42
43 44
45
46
47
48
49
50
51
52
53
54
55
56
58
59
60
Variational Bayes
Girvan-Newman
modularity

consistency:
The “resolution limit” problem
10 12 14 16 18 20
8
10
12
14
16
18
20
Ktrue
K*
K
*
=Ktrue
Variational Bayes
Modularity optimization
10 12 14 16 18 20
0.72
0.74
0.76
0.78
0.8
0.82
0.84
Ktrue
GNmodularity
Resolution limit problem on ring of 4−node cliques
Single−clique communities (correct)
Double−clique communities (incorrect)
GN modularity (Clauset’s algorithm)
Girvan-Newman modularity or Potts model w/ ﬁxed parameters suffers from a resolution limit,
where size of detected modules depends on network size
Fortunato et. al. (2007), Kumpula et. al. (2007),

good vs easy:

real data:
• Correctly infer K=12 conferences
Validation: NCAA football schedule
1
2 3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45 46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
nodes: teams
edges: games
shape: conference
color: inferred module

real data:
APS march meeting 2008
superconductivity
(experimentalists)
Nanotubes, Graphene
superconductivity
(theorists)

introduction formulation results extensions
extensions:
model extensions
full SBM (done)
hierarchical model p(Aij = 1|zi = zi ± 1)
hierarchical modeling (ensemble of graphs)
p(D|u, K) = ΠL
i dϑi p(D|ϑi )p(ϑi |u, K)
Rd
embedding (latent are real)
more ‘rigorous’ SOM?
‘correct’ for degree (allow variable aﬃnity) (cf. Bader, Karrer)
algorithm extensions
BP (see earlier talks)
map-reduce
cvmod (model selection via cross validation)

full SBM. . .
probability of edge depends only on block membership:
p(Aij |zi = µ, zj = ν) = ϑµν

full SBM. . .
• Nodes belong to “blocks” of
varying size
• Roll die for assignment of
nodes to blocks
• Probability of edge between two
nodes depends only on block
membership
• Flip (one of K2) coins for edges
• Result: mixture of Erdos-Renyi
graphs
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2275
adjacency matrix

full SBM. . .
vs

full SBM. . .
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2803
adjacency matrix

full SBM. . .
>> vbsbm_vs_vbmod(0)
running vbmod ...
Elapsed time is 1.136925 seconds.
running vbsbm ...
Fmod=13089.158019 Fsbm=13144.445782
vbmod wins

full SBM. . .
>> vbsbm_vs_vbmod(0.25)
running vbmod ...
running vbsbm ...
Fmod=20457.142416 Fsbm=19457.306022
vbsbm wins

full SBM. . .
>> vbsbm_vs_vbmod(0.5)
running vbmod ...
running vbsbm ...
Fmod=26133.351210 Fsbm=23921.797625
vbsbm wins

full SBM. . .
• Using same framework we can compare the
unconstrained and full stochastic block models via p(D|M,K*)
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
perturbation to constrained model
winpercentageforunconstrainedmodel
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
perturbation to constrained model
winpercentageforunconstrainedmodel
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2100
adjacency matrix
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2048
adjacency matrix
0 20 40 60 80 100 120
0
20
40
60
80
100
120
nz = 2108
adjacency matrix

extensions:
model extensions
full SBM (done)
p(D|u, K) = ΠL
Rd
map-reduce
i≠j
zi
zj
Aij
π
θ c
n

extensions:
model extensions
full SBM (done)
p(D|u, K) = ΠL
Rd
map-reduce
i≠j
zi
zj
Aij
π
θ c
n
L

for more info. . .
code: MATLAB & python (inc. “full” SBM) (vbmod.sf.net)
paper: arxiv 08 / prl 08
Hofman soon to come (not by me)
code in C++, inc. full ‘vblabel propagation’ algo
twitter-scale analysis

hierarchical/time series:
- biological challenges
- inference
- model selection

hfret. . .
Jan-Willem van de Meent, Ruben Gonzalez, Chris Wiggins
Columbia University

Ramakrishnan et al – http://www.mrc-lmb.cam.ac.uk/ribo/

hfret. . .
FRET = cy5 / (cy3 + cy5)
Tinoco and Gonzalez, Genes Dev, 2011

hfret. . .
Tinoco and Gonzalez, Genes Dev, 2011

hfret. . .
Unbound
EF-G bound
Tinoco and Gonzalez, Genes Dev, 2011 Fei et al, PNAS, 2009
(short-lived GS1 states correspond to an EF-G + GDPNP binding event)

hfret. . .
Unbound
EF-G bound
Tinoco and Gonzalez, Genes Dev, 2011 Fei et al, PNAS, 2009

hfret. . .
1. Identify states
2. Estimate Kinetic Rates
3. Average over many time series
4. Detect subpopulations

hfret. . .
FRET Signal

hfret. . .
FRET SignalHistogram

hfret. . .
FRET SignalHistogram
Idea: Find probability of belonging to each state

hfret. . .
Expectation Maximization
1. calculate p(z | x, θi)
2. calculate θi+1 from p(z | x, θi)

hfret. . .
Log-Likelihood
L = log p(x θ) = log
z
p(x, z θ)
Expectation Maximization
1. calculate p(z | x, θi)
2. calculate θi+1 from p(z | x, θi)

hfret. . .
Learned Truth
Accurate for occupancy of states,
not so good for rate estimates

hfret. . .
p(x, z µ, σ, π) = p(x z, µ, σ)p(z π)

hfret. . .
probability of state depends on previous state
p(zt+ =l zt =k) = Akl

hfret. . .

hfret. . .
p(z =k) = πk

hfret. . .
p(zt+ =l zt =k) = Akl

hfret. . .
p(xt zt = k) = N(xt µk , σk)

hfret. . .
Learned Real
We’ve learned:
parameters: θ = {µ, σ, π, A} states: p(z | x, θ)

hfret. . .
2 States 3 States

hfret. . .
Log-Likelihood
z
p(x, z θ)
Log-Evidence
L = log p(x u) = log
z
∫ dθ p(x, z θ)p(θu)
Log-Evidence

hfret. . .
Log-Likelihood
z
p(x, z θ)
Log-Evidence
z
Prior

hfret. . .
Log-Likelihood
z
p(x, z θ)
Log-Evidence
z
Ensemble

hfret. . .
Log-Likelihood
z
p(x, z θ)
Log-Evidence
best model has highest average likelihood
z
Log-Evidence

hfret. . .
Log-Evidence
31
z
Lower Bound
L =
z
∫ dθ q(z)q(θ w)log
p(x, z, θ u)
q(z)q(θ w)

≥ log p(x u)
q(z)q(θ w) p(z, θ x)

hfret. . .
31
Lower bound tight for true posterior
L =
z
∫ dθ p(z, θ x)log
p(x, z, θ u)
p(z, θ x)

=
z
∫ dθ p(z, θ x)log[p(x u)]
= log p(x u)
L = log p(x u) − Dkl [q(z)q(θ w) p(z, θ x)]

hfret. . .
31
We’ve learned:
parameters: q(θ | w) states: p(z | x, θ)
δLn
δq(zn)
=
VBEM Updates
δLn
δq(θn)
=

hfret. . .
31
variability: photophysical/experimental

hfret. . .
31

hfret. . .
31
Hierarchical Updates
∂
∂u

n
Ln =

hfret. . .
31
Hierarchical Updates
∂
∂u

n
Ln = “two-stage PEB model/CIHM”-Kass Steffey JASA 1989

hfret. . .
2. Update p(θ | u)
Until Σ Ln converges
Until Ln converges
• Update q(zn)
• Update q(θn | wn)
1. Run VBEM on each trace
δLn
δq(zn)
=
Hierarchical UpdatesVBEM Updates
δLn
δq(θn)
=
∂
∂u

n
Ln =

hfret. . .
2. Update p(θ | u)
Until Σ Ln converges
Until Ln converges
• Update q(zn)
• Update q(θn | wn)
1. Run VBEM on each trace
We’ve learned:
p(θn, zn | xn) ≃ q(θn) q(zn)
(for each trace)
p(θ | u)
(for ensemble)

hfret. . .
ξntkl = p(znt = k, znt+ = l xn)
1. Run mixture model on posterior counts
p(ξnA) =
tkl
Aξntkl
kl
p(ξn um) = ∫ dA p(Aum)p(ξn A)
2. Rerun with M x K block-diagonal form
uA
=

uA

uA

uA
M

hfret. . .
τfast
τslow
2
4
8
16
32
64

hfret. . .
τfast
τslow
2
4
8
16
32
64
reality

hfret. . .
no EF-G 50 nM EF-G 500 nM EF-G
Fei, Bronson, Hofman, Srinivas, Wiggins, Gonzalez, PNAS, 2009

hfret. . .
p(zk) ∼ e−Gk kB T
log p(zk) − log p(zl ) = −(Gk − Gl )kBT + cst.

hfret. . .
p(zk) ∼ e−Gk kB T
Δ∆G=logit(p)

hfret. . .
no EF-Gbound fraction and life-times

hfret. . .
5 nM EF-Gbound fraction and life-times

hfret. . .

model selection:

hfret. . .
Low Noise, UnderﬁttedInf Out - Inf In

hfret. . .
Low Noise, CorrectOut vs In

hfret. . .
Low Noise, OverﬁttedInf Out - Inf In

hfret. . .
High Noise, UnderﬁttedInf Out - Inf In

hfret. . .
High Noise, CorrectInf Out - Inf In

hfret. . .
High Noise, OverﬁttedInf Out - Inf In

hfret. . .
the future, in progress:
X

traditional role of statistics in biophysics
“if your experiment needs
statistics, you ought to
have done a better
experiment”
-lord rutherford

variational bayes in biophysics

More Related Content

Viewers also liked

Similar to variational bayes in biophysics

More from chris wiggins

Recently uploaded

variational bayes in biophysics