1. Multivariate analyses & decoding
Kay Henning Brodersen
Translational Neuromodeling Unit (TNU)
Institute for Biomedical Engineering, University of Zurich & ETH Zurich
Machine Learning and Pattern Recognition Group
Department of Computer Science, ETH Zurich
http://people.inf.ethz.ch/bkay
3. Why multivariate?
Multivariate approaches can be used to examine responses that are jointly encoded
in multiple voxels.
n.s.
n.s.
v1 v2 v1 v2 v2
orange juice apple juice v1
3
4. Why multivariate?
Multivariate approaches can utilize โhiddenโ quantities such as coupling strengths.
signal
๐ฅ2 (๐ก)
signal signal
๐ฅ3 (๐ก) observed BOLD signal
๐ฅ1 (๐ก)
activity
๐ง2 (๐ก) hidden underlying
activity activity
๐ง1 (๐ก) ๐ง3 (๐ก) neural activity and
coupling strengths
driving input ๐ข1(๐ก) modulatory input ๐ข2(๐ก)
t
t
Friston, Harrison & Penny (2003) NeuroImage; Stephan & Friston (2007) Handbook of Brain Connectivity; Stephan et al. (2008) NeuroImage
4
7. Encoding vs. decoding
condition encoding model
stimulus ๐: ๐ ๐ก โ ๐๐ก
response
prediction error
decoding model
โ: ๐๐ก โ ๐ ๐ก
context (cause or consequence) BOLD signal
๐๐ก โ โ ๐ ๐๐ก โ โ ๐ฃ
7
8. Regression vs. classification
Regression model
independent
continuous
variables ๐
dependent variable
(regressors)
Classification model
independent categorical
variables ๐ dependent variable vs.
(features) (label)
8
9. Univariate vs. multivariate models
A univariate model considers a A multivariate model considers
single voxel at a time. many voxels at once.
context BOLD signal context BOLD signal
๐๐ก โ โ ๐ ๐๐ก โ โ ๐๐ก โ โ ๐ ๐๐ก โ โ ๐ฃ , v โซ 1
Spatial dependencies between voxels Multivariate models enable
are only introduced afterwards, inferences on distributed responses
through random field theory. without requiring focal activations.
9
10. Prediction vs. inference
The goal of prediction is to find The goal of inference is to decide
a highly accurate encoding or between competing hypotheses.
decoding function.
predicting a cognitive predicting a comparing a model that weighing the
state using a subject-specific links distributed neuronal evidence for
brain-machine diagnostic status activity to a cognitive sparse vs.
interface state with a model that distributed coding
does not
predictive density marginal likelihood (model evidence)
๐ ๐ ๐๐๐ค ๐ ๐๐๐ค , ๐, ๐ = โซ ๐ ๐ ๐๐๐ค ๐ ๐๐๐ค , ๐ ๐ ๐ ๐, ๐ ๐๐ ๐ ๐ ๐ = โซ ๐ ๐ ๐, ๐ ๐ ๐ ๐๐
10
11. Goodness of fit vs. complexity
Goodness of fit is the degree to which a model explains observed data.
Complexity is the flexibility of a model (including, but not limited to, its number of
parameters).
๐ 1 parameter 4 parameters 9 parameters
truth
data
model ๐
underfitting optimal overfitting
We wish to find the model that optimally trades off goodness of fit and complexity.
Bishop (2007) PRML
11
12. Summary of modelling terminology
General Linear Model (GLM)
โข mass-univariate encoding model
โข to regress context onto brain activity
and find clusters of similar effects
Dynamic Causal Modelling (DCM)
โข multivariate encoding model
โข to evaluate connectivity
hypotheses
Classification
โข multivariate decoding model
โข to predict a categorical context
label from brain activity
Multivariate Bayes (MVB)
โข multivariate decoding model
โข to evaluate anatomical and
coding hypotheses
12
14. Constructing a classifier
A principled way of designing a classifier would be to adopt a probabilistic approach:
๐๐ก ๐ that ๐ which maximizes ๐ ๐ ๐ก = ๐ ๐๐ก , ๐, ๐
In practice, classifiers differ in terms of how strictly they implement this principle.
Generative classifiers Discriminative classifiers Discriminant classifiers
use Bayesโ rule to estimate estimate ๐ ๐ ๐ก ๐๐ก directly estimate ๐ ๐๐ก directly
๐ ๐ ๐ก ๐๐ก โ ๐ ๐๐ก ๐ ๐ก ๐ ๐ ๐ก without Bayesโ theorem
โข Gaussian Naรฏve Bayes โข Logistic regression โข Fisherโs Linear
โข Linear Discriminant โข Relevance Vector Discriminant
Analysis Machine โข Support Vector Machine
14
15. Support vector machine (SVM)
Linear SVM Nonlinear SVM
v2
v1
Vapnik (1999) Springer; Schรถlkopf et al. (2002) MIT Press
15
16. Stages in a classification analysis
Classification
Feature Performance
using cross-
extraction evaluation
validation
Bayesian mixed-
effects inference
๐ =1โ
๐ ๐ > ๐0 ๐, ๐
mixed effects
16
17. Feature extraction for trial-by-trial classification
We can obtain trial-wise estimates of neural activity by filtering the data with a GLM.
data ๐ design matrix ๐
coefficients
Boxcar
๐ฝ1
๐ฝ2
= ร + ๐
regressor for
trial 2
โฎ
๐ฝ๐
Estimate of this
coefficient
reflects activity
on trial 2
17
18. Cross-validation
The generalization ability of a classifier can be estimated using a resampling procedure
known as cross-validation. One example is 2-fold cross-validation:
examples
1 ? training example
2 ? ? test examples
3 ?
...
...
99 ?
100 ?
1 2 folds
18
19. Cross-validation
A more commonly used variant is leave-one-out cross-validation.
examples
1 ? training example
2 ? ? test example
3 ?
...
...
...
...
...
...
99 ?
100 ?
1 2 98 99 100 folds
performance evaluation
19
20. Performance evaluation
Single-subject study with ๐ trials
The most common approach is to assess how likely the obtained number of correctly
classified trials could have occurred by chance.
subject
Binomial test
๐ = ๐ ๐ โฅ ๐ ๐ป0 = 1 โ ๐ต ๐|๐, ๐0
In MATLAB:
p = 1 - binocdf(k,n,pi_0)
trial 1 ๐ number of correctly classified trials
0 ๐ total number of trials
trial ๐ 1 ๐0 chance level (typically 0.5)
1 ๐ต binomial cumulative density function
1
- +- 0
+-
20
22. Performance evaluation
Group study with ๐ subjects, ๐ trials each
In a group setting, we must account for both within-subjects (fixed-effects) and between-
subjects (random-effects) variance components.
available for
download soon
Binomial test on Binomial test on t-test on Bayesian mixed-
concatenated data averaged data summary statistics effects inference
๐=1โ ๐=1โ ๐โ๐0 ๐=1โ
๐ก= ๐
1 1 ๐ ๐โ1
๐ต โ๐|โ๐, ๐0 ๐ต ๐ โ๐| โ๐, ๐0 ๐ ๐ > ๐0 ๐, ๐
๐ ๐=1โ ๐ก ๐โ1 ๐ก
fixed effects fixed effects random effects mixed effects
๐ sample mean of sample accuracies ๐0 chance level (typically 0.5)
๐ ๐โ1 sample standard deviation ๐ก ๐โ1 cumulative Studentโs ๐ก-distribution
Brodersen, Mathys, Chumbley, Daunizeau, Ong, Buhmann, Stephan (under review)
22
23. Spatial deployment of informative regions
Which brain regions are jointly informative of a cognitive state of interest?
Searchlight approach Whole-brain approach
A sphere is passed across the brain. At each A constrained classifier is trained on whole-
location, the classifier is evaluated using only brain data. Its voxel weights are related to
the voxels in the current sphere โ map of t- their empirical null distributions using a
scores. permutation test โ map of t-scores.
Nandy & Cordes (2003) MRM Mourao-Miranda et al. (2005) NeuroImage
Kriegeskorte et al. (2006) PNAS Lomakina et al. (in preparation)
23
24. Summary: research questions for classification
Overall classification accuracy Spatial deployment of discriminative regions
accuracy 80%
100 %
50 %
Left or right Truth Healthy or
button? or ill?
lie?
55%
classification task
Temporal evolution of discriminability Model-based classification
accuracy Participant indicates
decision
100 %
{ group 1,
50 %
group 2 }
Accuracy rises above
chance
within-trial time
Pereira et al. (2009) NeuroImage, Brodersen et al. (2009) The New Collection
24
26. Multivariate Bayes
SPM brings multivariate analyses into the conventional inference framework of
Bayesian hierarchical models and their inversion.
Mike West
26
27. Multivariate Bayes
Multivariate analyses in SPM rest on the central tenet that inferences about how
the brain represents things reduce to model comparison.
some cause or vs.
decoding model
consequence
sparse coding in distributed coding in
orbitofrontal cortex prefrontal cortex
To make the ill-posed regression problem tractable, MVB uses a prior on voxel
weights. Different priors reflect different coding hypotheses.
27
29. Specifying the prior for MVB
1st level โ spatial coding hypothesis ๐ ร ๐
๐ข patterns
Voxel 2 is
allowed to play a
role. ๐
voxels
๐ Voxel 3 is allowed to
play a role, but only if
๐ ๐
its neighbours play
similar roles.
2nd level โ pattern covariance structure ฮฃ
๐ ๐ = ๐ฉ ๐ 0, ฮฃ
ฮฃ = โ๐ ๐๐ ๐ ๐
Thus: ๐ ๐ผ|๐ = ๐ฉ ๐ ๐ผ 0, ๐ฮฃ๐ ๐ and ๐ ๐ = ๐ฉ ๐ ๐, ฮ โ1
29
30. Inverting the model
1 Model inversion involves
Partition #1 subset ๐
finding the posterior
distribution over voxel
ฮฃ = ๐1 ร weights ๐ผ.
In MVB, this includes a
greedy search for the
optimal covariance
1 2
Partition #2 subset ๐ subset ๐ structure that governs
the prior over ๐ผ.
ฮฃ = ๐1 ร +๐2 ร
1 2 3
Partition #3 subset ๐ subset ๐ subset ๐
(optimal)
ฮฃ = ๐1 ร +๐2 ร +๐3 ร
30
31. Example: decoding motion from visual cortex
photic motion attention const
MVB can be illustrated using SPMโs attention-
to-motion example dataset.
This dataset is based on a simple block
design. There are three experimental factors:
๏จ photic โ display shows random dots
๏จ motion โ dots are moving
scans
๏จ attention โ subjects asked to pay attention
Buechel & Friston 1999 Cerebral Cortex
Friston et al. 2008 NeuroImage
31
32. Multivariate Bayes in SPM
Step 1
After having specified and estimated
a model, use the Results button.
Step 2
Select the contrast to be decoded.
32
34. Multivariate Bayes in SPM
Step 5
Here, the region of
interest is
specified as a
sphere around the
cursor. The spatial
prior implements
a sparse coding
hypothesis.
Step 4
Multivariate Bayes can be
invoked from within the
Multivariate section.
34
38. Using MVB for point classification
MVB may outperform
conventional point
classifiers when using a
more appropriate coding
hypothesis.
Support Vector
Machine
38
39. Summary: research questions for MVB
Where does the brain represent things? How does the brain represent things?
Evaluating competing anatomical hypotheses Evaluating competing coding hypotheses
39
41. Classification approaches by data representation
Model-based
classification
How do patterns of
hidden quantities (e.g.,
connectivity among brain
regions) differ between groups?
Activation-based
Structure-based classification
classification
Which functional
Which anatomical differences allow us to
structures allow us to separate groups?
separate patients and
healthy controls?
41
42. Generative embedding for model-based classification
step 1 โ A step 2 โ AโB
modelling embedding AโC
B BโB
C BโC
measurements from subject-specific subject representation in
an individual subject generative model model-based feature space
1
A accuracy
step 5 โ step 4 โ step 3 โ
interpretation evaluation classification
B
C
0
jointly discriminative discriminability of classification
connection strengths? groups? model
Brodersen, Haiss, Ong, Jung, Tittgemeyer, Buhmann, Weber, Stephan (2011) NeuroImage
Brodersen, Schofield, Leff, Ong, Lomakina, Buhmann, Stephan (2011) PLoS Comput Biol
42
48. Discriminative features in model space
PT PT
HG HG
(A1) (A1)
L R
MGB MGB
stimulus input highly discriminative
somewhat discriminative
not discriminative
48
49. Generative embedding and DCM
Question 1 โ What do the data tell us about hidden processes in the brain?
๏ฐ compute the posterior ?
๐ ๐ฆ ๐,๐ ๐ ๐ ๐
๐ ๐ ๐ฆ, ๐ =
๐ ๐ฆ ๐
Question 2 โ Which model is best w.r.t. the observed fMRI data?
๏ฐ compute the model evidence
๐ ๐ ๐ฆ โ ๐ ๐ฆ ๐ ๐(๐)
= โซ ๐ ๐ฆ ๐, ๐ ๐ ๐ ๐ ๐๐
?
Question 3 โ Which model is best w.r.t. an external criterion?
๏ฐ compute the classification accuracy
{ patient,
๐ โ ๐ฆ = ๐ฅ ๐ฆ control }
= ๐ โ ๐ฆ = ๐ฅ ๐ฆ, ๐ฆtrain , ๐ฅtrain ๐ ๐ฆ ๐ ๐ฆtrain ๐ ๐ฅtrain ๐๐ฆ ๐๐ฆtrain ๐ฅtrain
49
50. Summary
Classification
โข to assess whether a cognitive state is
linked to patterns of activity
โข to assess the spatial deployment of
discriminative activity
Multivariate Bayes
โข to evaluate competing anatomical
hypotheses
โข to evaluate competing coding hypotheses
Model-based analyses
โข to assess whether groups differ in terms of
patterns of connectivity
โข to generate new grouping hypotheses
50