Model-based analysis using generative embedding

Model-based analysis of disease states of the brain
using generative embedding
Kay H. Brodersen1,2
1 Translational Neuromodeling Unit (TNU), Department of Biomedical Engineering, University of Zurich & ETH Zurich
2 Machine Learning Laboratory, Department of Computer Science, ETH Zurich

Psychiatric spectrum diseases

Schizophrenia, depression, mania, etc.
 genetically based diagnoses impossible
(diverse genetic basis, strong gene-
environment interactions)
 even when symptoms are similar, causes
can differ across patients (multiple
pathophysiological mechanisms)
 large variability in treatment response

Consequences
need to infer on pathophysiological
mechanisms in individual patients!

2

Dissecting diseases into physiologically defined subgroups

voxel-based activity space model-based parameter space

0.4 0.4 -0.15 -0.15

embedding
generative
Voxel (64,-24,4) mm

0.3 0.3 -0.2 -0.2

L.HG → L.HG
0.2 0.2 -0.25 -0.25

0.1 0.1 -0.3 -0.3
type 1
0 0 -0.35 -0.35
type 2
-10 -10 0.5 0.5
-0.1 -0.1 -0.4 -0.4
-0.5 -0.5 0 0 -0.4 -0.4 0 0
0 0 -0.2 -0.2
Voxel (-56,-20,10) mm R.HG → L.HG
0.5 10 0.5 10 0 -0.5 0 -0.5
Voxel (-42,-26,10) mm L.MGB → L.MGB

Brodersen et al. (2011) PLoS Comput Biol

3

Classification approaches by data representation

Model-based
analyses
How do patterns of
hidden quantities (e.g.,
connectivity among brain
regions) differ between groups?

Activation-based
Structure-based analyses
analyses
Which functional
Which anatomical differences allow us to
structures allow us to separate groups?
separate patients and
healthy controls?

4

From models of pathophysiology to clinical applications

 Developing models of (patho)physiological processes u2

• neuronal: synaptic plasticity, neuromodulation u1 x3
• computational: learning, decision making

x1 x2

 dx  
m n

Validation studies in animals & humans   A   ui B (i )   x j D ( j )  x  Cu
dt  i 1 j 1


• can models detect experimentally Voxel-based feature space
induced changes, Generative score space

e.g., specific changes in synaptic plasticity?
0.4 0.4 -0.15 -0.15

embedding
0.3 0.3 -0.2 -0.2

generative
Voxel (64,-24,4) mm

L.HG  L.HG
0.2 0.2 -0.25 -0.25

 Clinical validation studies & translation
0.1 0.1
patients
-0.3 -0.3

patients
• clinical validation of classifications 0
0 -0.35
controls -0.35
controls
• predicting diagnosis, therapeutic response, outcome
-0.1
-0.5
-0.1
-0.5
-10 -10
-0.4 -0.4
0.5 0.5

0 0 -0.4 -0.4 0 0
0 0 Voxel (-56,-20,10) mm -0.2 -0.2 R.HG  L.HG
Voxel (-42,-26,10) mm 0.5 10 0.5 10 0 -0.5 0 -0.5
L.MGB  L.MGB
Klaas E. Stephan

5

Colleagues & collaborators

Thomas Schofield Justin R Chumbley
University College London University of Zurich

Cheng Soon Ong Jean Daunizeau
National ICT Australia · University of Melbourne ICM Paris · University College London

Kate Lomakina Joachim M Buhmann
University of Zurich · ETH Zurich ETH Zurich

Alexander Leff Klaas Enno Stephan
University College London (UCL) University of Zurich · ETH Zurich · UCL

Christoph Mathys
University of Zurich · ETH Zurich

6

Univariate vs. multivariate models

A univariate model considers a A multivariate model considers
single voxel at a time. many voxels at once.

context BOLD signal context BOLD signal
𝑋𝑡 ∈ ℝ 𝑑 𝑌𝑡 ∈ ℝ 𝑋𝑡 ∈ ℝ 𝑑 𝑌𝑡 ∈ ℝ 𝑣 , v ≫ 1

Spatial dependencies between voxels Multivariate models enable
are only introduced afterwards, inferences on distributed responses
through random field theory. without requiring focal activations.

7

Prediction vs. inference

The goal of prediction is to find The goal of inference is to decide
a highly accurate encoding or between competing hypotheses.
decoding function.

predicting a cognitive predicting a comparing a model that weighing the
state using a subject-specific links distributed neuronal evidence for
brain-machine diagnostic status activity to a cognitive sparse vs.
interface state with a model that distributed coding
does not

predictive density marginal likelihood (model evidence)
𝑝 𝑋 𝑛𝑒𝑤 𝑌 𝑛𝑒𝑤 , 𝑋, 𝑌 = ∫ 𝑝 𝑋 𝑛𝑒𝑤 𝑌 𝑛𝑒𝑤 , 𝜃 𝑝 𝜃 𝑋, 𝑌 𝑑𝜃 𝑝 𝑋 𝑌 = ∫ 𝑝 𝑋 𝑌, 𝜃 𝑝 𝜃 𝑑𝜃

8

Goodness of fit vs. complexity

Goodness of fit is the degree to which a model explains observed data.
Complexity is the flexibility of a model (including, but not limited to, its number of
parameters).

𝑌 1 parameter 4 parameters 9 parameters

truth
data
model 𝑋
underfitting optimal overfitting

We wish to find the model that optimally trades off goodness of fit and complexity.

Bishop (2007) PRML

9

Constructing a classifier

A principled way of designing a classifier would be to adopt a probabilistic approach:

𝑌𝑡 𝑓 that 𝑘 which maximizes 𝑝 𝑋 𝑡 = 𝑘 𝑌𝑡 , 𝑋, 𝑌

In practice, classifiers differ in terms of how strictly they implement this principle.

Generative classifiers Discriminative classifiers Discriminant classifiers

use Bayes’ rule to estimate estimate 𝑝 𝑋 𝑡 𝑌𝑡 directly estimate 𝑓 𝑌𝑡 directly
𝑝 𝑋 𝑡 𝑌𝑡 ∝ 𝑝 𝑌𝑡 𝑋 𝑡 𝑝 𝑋 𝑡 without Bayes’ theorem

• Gaussian Naïve Bayes • Logistic regression • Fisher’s Linear
• Linear Discriminant • Relevance Vector Discriminant
Analysis Machine • Support Vector Machine

10

Support vector machine (SVM)

Linear SVM Nonlinear SVM

v2

v1
Vapnik (1999) Springer; Schölkopf et al. (2002) MIT Press

11

Model-based analysis by generative embedding

step 1 — A step 2 —
A→B
model inversion kernel construction
A→C
B→B
B
C B→C

measurements from subject-specific subject representation in the
an individual subject inverted generative model generative score space
5 0.6

0.5
4

0.4
3
A
(14) R.HG -> L.HG
step 4 — 0.3
step 3 —
Voxel 2

2

interpretation 0.2
analysis
1
0.1
B
C 0
0

-1 -0.1
-2 0 2 4 6 8 -0.4 -0.35 -0.3 -0.25 -0.2 -0.15
Voxel 1 (1) L.MGB -> L.MGB

jointly discriminative separating hyperplane to
connection strengths discriminate between groups

Brodersen et al. (2011) NeuroImage; Brodersen et al. (2011) PLoS Comput Biol

12

Choosing a generative model: DCM for fMRI

BOLD signal haemodynamic forward model
signal 𝑥 = 𝑔(𝑧, 𝜃ℎ )
𝑥2 (𝑡)
signal signal
𝑥1 (𝑡) 𝑥3 (𝑡)

neuronal states neural state equation
activity
𝑧2 (𝑡)
𝑧 = 𝐴 + ∑𝑢 𝑗 𝐵 𝑗 𝑧 + 𝐶𝑢
activity activity
𝑧1 (𝑡) 𝑧3 (𝑡)
intrinsic
connectivity direct inputs
modulation of
driving input 𝑢1(𝑡) modulatory input 𝑢2(𝑡) connectivity

t Friston, Harrison & Penny (2003) NeuroImage
t Stephan & Friston (2007) Handbook of Brain Connectivity

13

Summary of the analysis

1 2 3
estimation of unsupservised training the
group contrasts DCM inversion SVM
based on for each subject on all subjects
all subjects except subject j
except subject j
pre- testing the SVM performance
processing selection of on subject j evaluation
voxels for
regions of
interest
A

B
C

repeat for each subject

14

Example: diagnosis of moderate aphasia

15

Example: diagnosing stroke patients

To illustrate our approach, we aimed to distinguish between stroke patients and
healthy controls, based on non-lesioned regions involved in speech processing.

16


anatomical
regions of interest

y = –26 mm

17


PT PT

HG HG
L (A1) (A1) R

MGB MGB

stimulus input

18

Univariate analysis: parameter densities

L.MGB → L.MGB L.MGB → L.HG L.MGB → L.PT L.HG → L.HG *** L.HG → L.PT *** L.HG → R.HG L.PT → L.MGB L.PT → L.HG

L.PT → L.PT L.PT → R.PT R.MGB → R.MGB R.MGB → R.HG R.MGB → R.PT *** R.HG → L.HG *** R.HG → R.HG R.HG → R.PT
range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x)

R.PT → L.PT R.PT → R.MGB R.PT → R.HG R.PT → R.PT input to L.MGB input to R.MGB patients
range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) controls
range(d1$x, d2$x) range(d1$x, d2$x)

19

Multivariate analysis: connectional fingerprints

patients
controls

20

Full Bayesian approach to performance evaluation

Beta-binomial model Normal-binomial model

𝑝 𝛼 + , 𝛽+ 𝛼+,𝛽+ 𝛼−,𝛽− 𝑝 𝛼 −, 𝛽− Inv-Wish 𝑣0 Σ|Λ−1
0 𝜇, Σ 𝒩 𝜇 𝜇0 , Σ/𝜅0

Beta 𝜇 + |𝛼+,𝛽+
𝑗 𝜋+
𝑗 𝜋−
𝑗
Beta 𝜇 − |𝛼−,𝛽−
𝑗 𝜌𝑗 𝒩2 𝜌 𝑗 𝜇, Σ

Bin 𝑘 + 𝜋 + , 𝑛+
𝑗 𝑗 𝑗 𝑘+
𝑗 𝑘−
𝑗
Bin 𝑘 − 𝜋 − , 𝑛−
𝑗 𝑗 𝑗 Bin 𝑘+ 𝜎 𝜌 𝑗,1 ,𝑛+
𝑗 𝑗 𝑘+
𝑗 𝑘−
𝑗
Bin 𝑘− 𝜎 𝜌 𝑗,2 ,𝑛−
𝑗 𝑗

𝑗 = 1… 𝑚 𝑗 = 1… 𝑚

Brodersen et al. (under review)

21

Classification performance

100
100% * Activation-based analyses
a anatomical feature selection
90
90% c mass-univariate contrast feature selection
balanced accuracy

s locally univariate searchlight feature selection
balanced accuracy

p PCA-based dimensionality reduction
80%
80
Correlation-based analyses
m correlations of regional means
70%
70 e correlations of regional eigenvariates
z Fisher-transformed eigenvariates correlations
60%
60 Model-based analyses
o gen.embed., original full model
n.s. n.s.
50% f gen.embed., less plausible feedforward model
50
162 8977p2m332 07o24389 60
a 1c 1 s 34791 e 3 z 81 f 3 l 3 r l gen.embed., left hemisphere only
r gen.embed., right hemisphere only
activation- correlation- model-
based based based

22

Biologically less plausible models perform poorly

L.PT R.PT L.PT R.PT

L.HG R.HG L.HG R.HG
(A1) (A1) (A1) (A1)

L.MGB R.MGB L.MGB R.MGB

auditory stimuli auditory stimuli auditory stimuli

feedforward connections only left hemisphere only right hemisphere only
accuracy: 77% accuracy: 81% accuracy: 59%

23

The generative projection

Voxel-based contrast space Model-based parameter space

0.4 0.4 -0.15 -0.15

embedding
generative
Voxel (64,-24,4) mm

0.3 0.3 -0.2 -0.2

L.HG → L.HG
0.2 0.2 -0.25 -0.25

0.1 0.1 -0.3 -0.3

patients
0 0 -0.35 -0.35
controls
-10 -10 0.5 0.5
-0.1 -0.1 -0.4 -0.4
-0.5 -0.5 0 0 -0.4 -0.4 0 0
0 0 -0.2 -0.2
Voxel (-56,-20,10) mm R.HG → L.HG
Voxel (-42,-26,10) mm 0.5 10 0.5 10 0 -0.5 0 -0.5
L.MGB → L.MGB

classification accuracy classification accuracy
(using all voxels in the regions of interest) (using all 23 model parameters)
75% 98%
24

Discriminative features in model space

PT PT

HG HG
(A1) (A1)
L R

MGB MGB

stimulus input

25

Discriminative features in model space

PT PT

HG HG
(A1) (A1)
L R

MGB MGB

stimulus input highly discriminative
somewhat discriminative
not discriminative

26

Generative embedding and DCM

Question 1 – What do the data tell us about hidden processes in the brain?
 compute the posterior ?
𝑝 𝑦 𝜃,𝑚 𝑝 𝜃 𝑚
𝑝 𝜃 𝑦, 𝑚 =
𝑝 𝑦 𝑚

Question 2 – Which model is best w.r.t. the observed fMRI data?
 compute the model evidence
𝑝 𝑚 𝑦 ∝ 𝑝 𝑦 𝑚 𝑝(𝑚)
= ∫ 𝑝 𝑦 𝜃, 𝑚 𝑝 𝜃 𝑚 𝑑𝜃
?

Question 3 – Which model is best w.r.t. an external criterion?
 compute the classification accuracy
{ patient,
𝑝 ℎ 𝑦 = 𝑥 𝑦 control }
= 𝑝 ℎ 𝑦 = 𝑥 𝑦, 𝑦train , 𝑥train 𝑝 𝑦 𝑝 𝑦train 𝑝 𝑥train 𝑑𝑦 𝑑𝑦train 𝑑𝑥train

27

Model-based classification using DCM

model-based
classification { group 1,
group 2 }

activation-based model selection
classification
vs.

structure-based inference on ?
classification model
parameters

28

Model-based inference on individual pathophysiology

 model of neuronal (patho)physiology

 application to brain activity data  diagnostic classification
from individual patients

type 1 treatment X

spectrum model-based
type 2 treatment Y
disease diagnostic tests

type 3 treatment Z

29

Model-based analysis using generative embedding

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Viewers also liked

Viewers also liked (8)

Similar to Model-based analysis using generative embedding

Similar to Model-based analysis using generative embedding (10)

Model-based analysis using generative embedding