SlideShare a Scribd company logo
1 of 35
Download to read offline
The impact of study design on pattern estimation
for single-trial multivariate pattern analysis
Mumford, Jeanette A., Tyler Davis, and Russell A. Poldrack, 2014, Neuroimage
CLMN Journal Club

2019.03.13

Emily Yunha Shin
1
A primer on pattern-based approaches to fMRI:
principles, pitfalls, and perspectives
Haynes, John-Dylan, 2015, Neuron
A Primer on Pattern-Based
Approaches to fMRI:
Principles, Pitfalls, and Perspectives
	 John-Dylan Haynes, 2015, Neuron

2
The Aim of This Paper
• To provide a concise introduction to the key concepts of pattern-
based analysis
• To present an overview of challenges and limitations in the
interpretation of decoding results, especially with respect to underlying
neural population signals
3
• Eight voxels in visual cortex while a participant is viewing a
picture of a cat (A, red) or dog (B, green)

• The two response distributions are separable in (C) case

• In many cases, the marginal distributions for both conditions are
highly overlapping (D)

• A multivariate solution (E); LDA, SVM

• The weights of the classifier for each voxel can be plotted as a
weight map (H)

• In certain cases, response distributions cannot be sufficiently
partitioned using single linear decision boundaries (F)

• A nonlinear approaches; kNN classifier, nonlinear SVMs

• Often one might be interested in predicting a continuous
variable (G)

• Multivariate regression approaches

• Train the classifier

• Split trainig / testing data (H)

• Assess whether the classifier can correctly assign the labels
in test data = classification accuracy

• Repeated again using a different partitioning of data into
training and test = cross-validation

• It is absolutely vital that the training and test data are
independent and stationary in order to avoid overfitting
and circular inference.
Analyzing Pattern fMRI Signals
4
Interpreting Accuracies
• Underestimating Information
• An absence of information at the level of fMRI does not mean that the local neural
populations do not contain information

• If neurons with different tuning properties were mixed randomly in a salt-and-pepper
fashion, then no macroscopic information would be expected at the voxel level

• A single neuron might contain substantial information that is drowned out by other
neurons only contributing noise
• Overestimating Information
• There are several ways in which an observed accuracy with fMRI might overestimate
the information

• A voxel might sample a large blood vessel that drains a large population of neurons
• = Aggregation of information that is not computationally used at the neural level.

• The low sampling rate of fMRI signals and the sluggishness of the hemodynamic
response

• might temporally integrate information beyond the relevant timescales of neural signal processing
5
• Comparing Different Brain Regions
• Several factors that limit the comparison of accuracies between different brain regions

• The size of regions

• The sensitivity of fMRI to neural activity (the local hemodynamic response
efficiency)

• The signal-to-noise levels also generally differ between regions

• Other Limitations
• The obtained accuracy depends on the partitioning of data into training and test.

• Less training data generally yield lower accuracies

• Experimental design efficiency

• Block based, trial based, or others?

• The level of temporal aggregation

• ISI?

• Smoothing
Interpreting Accuracies
6
Circularity And Overfitting
• Any dependencies are likely to cause false-positive classification of the test
data even in the absence of information

• Double dipping: leakage of information between training and test data

• Overfitting phenomenon

• Overfitting can occur if a too-complex classifier is fit to the training dataset that
works well in the training data, but then fails to generalize to the test data.

• Testing the generalizability of a classifier on independent test data thus protects
against overfitting.

• When different classifiers are tried out on the same data …

• Overfitting only can be revealed by testing the accuracy on a further
independent test dataset.

• A nested cross-validation: split data to test / validation / training set

• Another solution: decreases the number of free parameters
7
Interpreting Classification Maps
• Weight map
• In a linear classifier such as LDA or SVM, the weight at each voxel directly reflects the
contribution of that voxel to the classification result

• But it does not permit a conclusion as to whether an individual voxel contributed
significantly to the result

• Test whether it makes a significant difference if the voxel is included in the classifier

• A voxel might have a significant weight despite not having label-related information.

• Searchlight analysis
• depict the centers of informative voxel clusters, but not the informative voxels themselves.
8
Controlling For Nuisance Variables
• A more detailed control for confounding factors is also necessary.
• Classifiers can extract information even if the sign of an effect
randomly varies across subjects

• Thus, more elaborate controls are needed to avoid that decoding
results merely reflect nuisance variables, such as difficulty or
attention

• Solutions

• Regress out the nuisance variable

• Directly compare decoding for nuisance variables and for the
cognitive factor of interest
9
Extra. Information-based Approach
• What's in a pattern? Examining the type of signal
multivariate analysis uncovers at the group level
• Gilron Roee et al., 2017, NeuroImage

• 2nd level multivariate analysis often “information-based”,
univariate “activation-based”.

• Information-based: the sign of the effect of individual
subjects is discarded and a non-directional summary
statistic

• Activation-based: both signal magnitude and sign are
taken into account

• Implicit paradigm shift in signal definition in univariate vs.
multivariate analysis.

• This paper…

• shows that directional and non-directional group-level
MVPA approaches uncover distinct brain regions
with only partial overlap.

• offers resolution by proposing multivariate activation
based statistic.
10
Summary
• The approach has to be applied carefully in order to avoid overfitting
of the large parameter spaces involved.

• Caution also is required when interpreting the results of classification
studies in terms of the information encoded in neural populations or in
the tuning of single neurons.

• (Use with the form of encoding models or combination with RSAs)
11
The impact of study design on
pattern estimation for single-trial
multivariate pattern analysis
	 Jeanette A. Mumford et al., 2014, NeuroImage
12
Highlights
• Assessment of Type I error in pattern similarity and classification
analyses.

• Type I errors of similarity analyses are notably affected by study design.

• Classification analyses are more robust to study design choice.

• The optimal design for pattern similarity is to use between-run-based
patterns.

• The optimal analysis strategy for classification is between-run cross
validation.
13
Least Squares All Model
• All trials are estimated simultaneously in a single
model, using a separate regressor consisting of an
impulse (or boxcar) function convolved with a
double gamma hemodynamic response function
(HRF)

• = beta-series regression

• Pitfall: when trials have a short interstimulus
interval (ISI), e.g., less than 3 s between the end
of one stimulus and onset of the next stimulus, the
regressors become highly correlated, or
collinear, which inflates the variance of the
resulting parameter estimates.
14
Least Squares Single Model
• The LSS model reduces collinearity by using
a separate model for each trial, in which the
first regressor models the trial of interest and
the other two regressors model the remaining
trials according to trial type

• Only the first parameter estimate is retained in
each model and estimates the activation for
that individual trial

• LSS has been shown to produce higher
classification accuracies than LSA for short
ISIs (3–5 s)
15
Overview: Pattern Similarity (Discussion)
• In within run setting,
• Even large ISIs (15s) do not guarantee independence between the
pattern estimates and this can drive false positive differences.

• regardless of which of the three pattern estimators (LSS, LSA, Add6)
are used…

• Only way to preserve Type I error in within run: 

• randomly order the trials with a different randomization for each
subject
16
Overview: Pattern Classification (Discussion)
• A Within-run CV
• would be susceptible to a peeking bias

• Especially the case for blocked and alternating trials when the ISI
was only 3 second long

• Between-run CV
• is stable regardless of trial order and is the recommended approach.

• Shorter ISI studies
• LSS model & between-run CV is overall more advantageous

• without any detriment to the Type I or II error rates!
17
Derivations
• BOLD time series Y

• Trial-specific activations β

• Vβ = true covariance between the trials

• = true representational similarity covariance matrix

• = the pattern similarity correlations can be derived

• Combining (1) and (2),

• The variance of Y



18
Y = XLSAβ + ϵY, ϵY ∼ N(0,VY) (1)
β = μ + ϵβ, ϵβ ∼ N(0,Vβ) (2)
Y = XLSAμ + XLSAϵβ + ϵY (3)
Var(Y) = XLSAVβX′LSA + Vy (4)
Derivations
• Pattern distribution: LSA
• The trial-specific parameter estimates

• The true similarity between the estimated patterns of all pairs of trials, derived from Eqs
(4), (5)

• In the special case where the BOLD time series are uncorrelated, Vy = σy2I, where σy2
is the variance and I is a Ntpts × Ntpts identity matrix, this estimated variance reduces to 





19
(5)̂βLSA = (X′LSAXLSA)−1
X′LSAY
(6)
Var( ̂βLSA) = (X′LSAXLSA)−1
X′LSAVar(Y)XLSA(X′LSAXLSA)−1
= Vβ + (X′LSAXLSA)−1
X′LSAVyXLSA(X′LSAXLSA)−1
(7)Var( ̂βLSA) = Vβ + σ2
y (X′LSAXLSA)−1
Derivations
• Pattern distribution: LSS
• The estimate for the first trial

• where c is the row vector, [1, 0, +0] (regressors)

• All LSS-based trial estimates can simultaneously be estimated using

• where

• Combining this with the variance of Y given in (4) yields

• In the special case where Vy = σy2I, 

20
̂βLSSi,1 = c(X′LSSi
XLSSi
)−1
XLSSi
Y (8)
̂βLSS = XLSSY (9)
XLSS =
c(X′LSS1
XLSS1
)−1
XLSS1
c(X′LSS2
XLSS2
)−1
XLSS2
. . .
c(X′LSSNtrials
XLSSNtrials
)−1
XLSSNtrials
(10)
Var( ̂βLSS) = XLSSVar(Y)X′LSS
= XLSSXLSAVβX′LSAX′LSS + XLSSVyX′LSS
(11)
Var( ̂βLSS) = XLSSXLSAVβX′LSAX′LSS + XLSSX′LSSσ2
y (12)
Methods: Pattern Similarity (within)
• Experiment design
• With / without temporal autocorrelation: Eqs (6), (11) or Eqs (7), (12)

• 2 trial types: t1, t2

• trial numbers: 22 or 42

• Different lengths of ISI: mean 3s (2~5s), 7s (6~9s), (+15s)

• σy2 = 1

• 225 time points (TR = 2s)

• Trial orderings: blocked, alternating(t1, t2, t1, t2, …), random order

• Hypothesis
• Whether within-trial-type similarities(wt1, wt2) differ from each other or from
between-trial-type similarities(bt1t2)

• Paired t-test: wt1-wt2, wt1-bt1t2, wt2-bt1t2
21
Methods: Pattern Similarity (within)
• Simulated data parameters
• Temporal covariance estimates Vy are based on real data with 225 time points
(TR = 2s)

• estimated from 198 resting state data sets for the same ROI, a randomly
chosen 7 × 7 × 7 voxel cube in standard MNI space (Right Putamen)

• For each simulated subject, assuming that the true similarity was the identity Vβ

• A design matrix was randomly generated

• Eqs. (7) and (12) were used to compute the similarity matrices

• 10,000 data sets of 30 subjects were randomly generated, 

• using a different set of ISIs and randomly ordered trials, when applicable, for
each subject.

• An additional simulation: pseudorandom order
22
Methods: Pattern Similarity (between)
• The Type I error rates when similarities were computed between-run
• Eq. (2) was used to simulate activation magnitudes for 500 voxels

• The values for β were used to simulate time series following Eq. (1)

• The true similarity covariance, Vβ, was set to the identity matrix 

• The mean trial activation, µ, was set to a vector of zeros

• Temporal covariances(VY) were derived from resting state data
23
Methods: Classification (within & between)
• Simulated data parameters
• Mean ISI: 3s, 7s

• Trial orders: blocked, alternating, random

• Data for 1000 subjects were generated

• Additional pseudorandom ordering test in within-run CV

• based on a data set of 30 subjects

• Classification options
• SVM classifier (cost = 1)

• 2-fold CV 

• (within, random split; between, grouped)

• WR: within-run

• BR(Same): between-run & same ISIs and stimulus order

• BR(Diff): between-run & different ISIs and stimulus order
24
Results: Pattern Similarity (within)
• Fig 2. Impact of collinearity in LSS and LSA models on similarity
estimates
• Blocked design
25
Results: Pattern Similarity (within)
• Patterns of LSA
• At lag 1, One parameter estimate will be elevated and, to preserve the model fit, the
collinear counterparts' parameter estimates will be pushed in the opposite direction

• At lag 2, the two trials will be pushed in the same direction by their common collinear
neighbor, causing a positive correlation.

• Patterns of LSS
• Two blocks along the diagonal: 

• a weak collinearity occurs if the neighbors of a trial of interest are exemplars of the
same category

• With blocked trial order, almost all trials of t1 have t1 neighbors,

• that results in negatively biased similarity estimates between same category
• Strong positive correlations for early lags:

• Result of each trial's estimate coming from an independent model
• This weak effect will be shown to have a smaller, but opposite, impact on pattern
similarities.
26
• Fig 3. Distributions of paired similarity differences (30 subjects)
Results: Pattern Similarity (within)
27
Eqs. (7), (12)
Eqs. (6), (11)
• Table 1. Type 1 error rates across simulations
• Table 2. Pseudorandom
Results: Pattern Similarity (within&between)
28
• Fig 4. Classification accuracy distributions
Results: Classification
29
Results: Classification
• Classification accuracy distributions in Pseudorandom
30
Discussion
• No observed benefit with increased ISI
• Even with long ISIs there is an impact of trial order on pattern similarity
estimates.

• Effects driven by temporal autocorrelation occur because time points that are
closer tend to be more highly correlated

• In the case of the blocked trials…

• most between-trial-type similarities are very far apart and the within-trial-type similarities
are at smaller lags

• This is why the wt1–bt1t2 and wt2–bt1t2 distributions tend to be significantly larger than 0

• Generally, at small ISIs the different pattern estimators suffer from collinearity,
driven by positively correlated regressors in the models

• When the ISI is increased this alleviates collinearity, yet there will always be a
slight negative correlation between regressors…

• weak effect, but is enough to drive biases
31
Discussion
• Benefits of between-run similarity analyses
• The inflation of Type I error rates that arose in the within-run similarity
analyses was driven by correlations, 

• either between covariates in the model used to estimate the patterns, or
the temporal covariance

• The time series from two different runs are completely independent from each
other

• The simulations used temporal covariance estimates from different subjects 

• in place of temporal covariance estimates from two runs of the same
subject 

• Since temporal covariance for small differences in time seems to have the
largest impact, it seems that two runs, which would typically have a couple
of minutes between them, would not be problematic…
32
Discussion
• Why and when within-run CV fails
• Within-run cross validations are especially problematic for the blocked
trial order and somewhat for alternating trial orders

• Randomly ordered trials seem to perform fine within the within-run CV
setting

• The reason the results vary according to study design is due to
different levels of peeking bias

• In the alternating case, the trials of the same class are always
separated by at least 1 other trial, so the relationship is weaker
33
Discussion
• Add6 model
• Intuitively, with the Add6 approach, one might think that there will not
be a model-based effect, since a model is not necessary to extract the
patterns

• However, the results of the Add6 model are very similar to LSA at a long
ISI of 15 s
34
Discussion
• Impact on future study designs
• It seems that using multiple, shorter runs would be more advantageous
than using longer runs (with between-run analysis)

• When want to chapters a different level of learning,

• This would likely require both shorter runs and tasks where
learning occurs slow enough that ceiling is not immediately
reached
35

More Related Content

What's hot

Business Research Methods PPT - III
Business Research Methods PPT - IIIBusiness Research Methods PPT - III
Business Research Methods PPT - IIIRavinder Singh
 
Presentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlalPresentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlalIstiqlalEid
 
Business Research Methods PPT -IV
Business Research Methods PPT -IVBusiness Research Methods PPT -IV
Business Research Methods PPT -IVRavinder Singh
 
Z-scores: Location of Scores and Standardized Distributions
Z-scores: Location of Scores and Standardized DistributionsZ-scores: Location of Scores and Standardized Distributions
Z-scores: Location of Scores and Standardized Distributionsjasondroesch
 
Basic stat analysis using excel
Basic stat analysis using excelBasic stat analysis using excel
Basic stat analysis using excelParag Shah
 
The t Test for Two Related Samples
The t Test for Two Related SamplesThe t Test for Two Related Samples
The t Test for Two Related Samplesjasondroesch
 
The t Test for Two Independent Samples
The t Test for Two Independent SamplesThe t Test for Two Independent Samples
The t Test for Two Independent Samplesjasondroesch
 
The Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and IndependenceThe Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and Independencejasondroesch
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statisticsjasondroesch
 
Exploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical SciencesExploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical SciencesParag Shah
 
Introduction to Analysis of Variance
Introduction to Analysis of VarianceIntroduction to Analysis of Variance
Introduction to Analysis of Variancejasondroesch
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statisticsewhite00
 
Analysing a hypothetical data with t-test in excel
Analysing a hypothetical data with t-test in excelAnalysing a hypothetical data with t-test in excel
Analysing a hypothetical data with t-test in excelRaihan Imran Rahom
 
Measures Of Dispersion - Biostatics: Dr Rohit Bhaskar
Measures Of Dispersion - Biostatics: Dr Rohit BhaskarMeasures Of Dispersion - Biostatics: Dr Rohit Bhaskar
Measures Of Dispersion - Biostatics: Dr Rohit BhaskarDr Rohit Bhaskar, Physio
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpointjamiebrandon
 
Sampling Techniques, Data Collection and tabulation in the field of Social Sc...
Sampling Techniques, Data Collection and tabulation in the field of Social Sc...Sampling Techniques, Data Collection and tabulation in the field of Social Sc...
Sampling Techniques, Data Collection and tabulation in the field of Social Sc...Manoj Sharma
 

What's hot (20)

Business Research Methods PPT - III
Business Research Methods PPT - IIIBusiness Research Methods PPT - III
Business Research Methods PPT - III
 
Presentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlalPresentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlal
 
Business Research Methods PPT -IV
Business Research Methods PPT -IVBusiness Research Methods PPT -IV
Business Research Methods PPT -IV
 
Chapter 7
Chapter 7 Chapter 7
Chapter 7
 
Z-scores: Location of Scores and Standardized Distributions
Z-scores: Location of Scores and Standardized DistributionsZ-scores: Location of Scores and Standardized Distributions
Z-scores: Location of Scores and Standardized Distributions
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Basic stat analysis using excel
Basic stat analysis using excelBasic stat analysis using excel
Basic stat analysis using excel
 
Hypo
HypoHypo
Hypo
 
The t Test for Two Related Samples
The t Test for Two Related SamplesThe t Test for Two Related Samples
The t Test for Two Related Samples
 
The t Test for Two Independent Samples
The t Test for Two Independent SamplesThe t Test for Two Independent Samples
The t Test for Two Independent Samples
 
The Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and IndependenceThe Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and Independence
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
 
Exploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical SciencesExploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
 
83341 ch27 jacobsen
83341 ch27 jacobsen83341 ch27 jacobsen
83341 ch27 jacobsen
 
Introduction to Analysis of Variance
Introduction to Analysis of VarianceIntroduction to Analysis of Variance
Introduction to Analysis of Variance
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
 
Analysing a hypothetical data with t-test in excel
Analysing a hypothetical data with t-test in excelAnalysing a hypothetical data with t-test in excel
Analysing a hypothetical data with t-test in excel
 
Measures Of Dispersion - Biostatics: Dr Rohit Bhaskar
Measures Of Dispersion - Biostatics: Dr Rohit BhaskarMeasures Of Dispersion - Biostatics: Dr Rohit Bhaskar
Measures Of Dispersion - Biostatics: Dr Rohit Bhaskar
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
 
Sampling Techniques, Data Collection and tabulation in the field of Social Sc...
Sampling Techniques, Data Collection and tabulation in the field of Social Sc...Sampling Techniques, Data Collection and tabulation in the field of Social Sc...
Sampling Techniques, Data Collection and tabulation in the field of Social Sc...
 

Similar to Pitfalls of multivariate pattern analysis(MVPA), fMRI

Multivariate Variate Techniques
Multivariate Variate TechniquesMultivariate Variate Techniques
Multivariate Variate TechniquesDr. Keerti Jain
 
Computational approaches to fMRI analysis
Computational approaches to fMRI analysisComputational approaches to fMRI analysis
Computational approaches to fMRI analysisEmily Yunha Shin
 
Introduction to sampling
Introduction to samplingIntroduction to sampling
Introduction to samplingSituo Liu
 
Some statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisSome statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisUC Davis
 
scope and need of biostatics
scope and need of  biostaticsscope and need of  biostatics
scope and need of biostaticsdr_sharmajyoti01
 
2010 smg training_cardiff_day1_session4_harbord
2010 smg training_cardiff_day1_session4_harbord2010 smg training_cardiff_day1_session4_harbord
2010 smg training_cardiff_day1_session4_harbordrgveroniki
 
Correlational research
Correlational research Correlational research
Correlational research Self employed
 
De-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statisticsDe-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statisticsGillian Byrne
 
Research 101: Inferential Quantitative Analysis
Research 101: Inferential Quantitative AnalysisResearch 101: Inferential Quantitative Analysis
Research 101: Inferential Quantitative AnalysisHarold Gamero
 
Some nonparametric statistic for categorical & ordinal data
Some nonparametric statistic for categorical & ordinal dataSome nonparametric statistic for categorical & ordinal data
Some nonparametric statistic for categorical & ordinal dataRegent University
 
4 measures of variability
4  measures of variability4  measures of variability
4 measures of variabilityDr. Nazar Jaf
 
Cochrane Collaboration
Cochrane CollaborationCochrane Collaboration
Cochrane CollaborationNinian Peckitt
 

Similar to Pitfalls of multivariate pattern analysis(MVPA), fMRI (20)

Multivariate Variate Techniques
Multivariate Variate TechniquesMultivariate Variate Techniques
Multivariate Variate Techniques
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
Computational approaches to fMRI analysis
Computational approaches to fMRI analysisComputational approaches to fMRI analysis
Computational approaches to fMRI analysis
 
Data in science
Data in science Data in science
Data in science
 
Introduction to sampling
Introduction to samplingIntroduction to sampling
Introduction to sampling
 
Some statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisSome statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysis
 
scope and need of biostatics
scope and need of  biostaticsscope and need of  biostatics
scope and need of biostatics
 
2010 smg training_cardiff_day1_session4_harbord
2010 smg training_cardiff_day1_session4_harbord2010 smg training_cardiff_day1_session4_harbord
2010 smg training_cardiff_day1_session4_harbord
 
Fishers test
Fishers testFishers test
Fishers test
 
Correlational research
Correlational research Correlational research
Correlational research
 
Non parametric test
Non parametric testNon parametric test
Non parametric test
 
Statistics
StatisticsStatistics
Statistics
 
De-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statisticsDe-Mystifying Stats: A primer on basic statistics
De-Mystifying Stats: A primer on basic statistics
 
Research 101: Inferential Quantitative Analysis
Research 101: Inferential Quantitative AnalysisResearch 101: Inferential Quantitative Analysis
Research 101: Inferential Quantitative Analysis
 
Res701 research methodology lecture 7 8-devaprakasam
Res701 research methodology lecture 7 8-devaprakasamRes701 research methodology lecture 7 8-devaprakasam
Res701 research methodology lecture 7 8-devaprakasam
 
Some nonparametric statistic for categorical & ordinal data
Some nonparametric statistic for categorical & ordinal dataSome nonparametric statistic for categorical & ordinal data
Some nonparametric statistic for categorical & ordinal data
 
Scales of measurement
Scales of measurementScales of measurement
Scales of measurement
 
4 measures of variability
4  measures of variability4  measures of variability
4 measures of variability
 
Cochrane Collaboration
Cochrane CollaborationCochrane Collaboration
Cochrane Collaboration
 
Presentation1
Presentation1Presentation1
Presentation1
 

Recently uploaded

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Recently uploaded (20)

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

Pitfalls of multivariate pattern analysis(MVPA), fMRI

  • 1. The impact of study design on pattern estimation for single-trial multivariate pattern analysis Mumford, Jeanette A., Tyler Davis, and Russell A. Poldrack, 2014, Neuroimage CLMN Journal Club 2019.03.13 Emily Yunha Shin 1 A primer on pattern-based approaches to fMRI: principles, pitfalls, and perspectives Haynes, John-Dylan, 2015, Neuron
  • 2. A Primer on Pattern-Based Approaches to fMRI: Principles, Pitfalls, and Perspectives John-Dylan Haynes, 2015, Neuron 2
  • 3. The Aim of This Paper • To provide a concise introduction to the key concepts of pattern- based analysis • To present an overview of challenges and limitations in the interpretation of decoding results, especially with respect to underlying neural population signals 3
  • 4. • Eight voxels in visual cortex while a participant is viewing a picture of a cat (A, red) or dog (B, green) • The two response distributions are separable in (C) case • In many cases, the marginal distributions for both conditions are highly overlapping (D) • A multivariate solution (E); LDA, SVM • The weights of the classifier for each voxel can be plotted as a weight map (H) • In certain cases, response distributions cannot be sufficiently partitioned using single linear decision boundaries (F) • A nonlinear approaches; kNN classifier, nonlinear SVMs • Often one might be interested in predicting a continuous variable (G) • Multivariate regression approaches • Train the classifier • Split trainig / testing data (H) • Assess whether the classifier can correctly assign the labels in test data = classification accuracy • Repeated again using a different partitioning of data into training and test = cross-validation • It is absolutely vital that the training and test data are independent and stationary in order to avoid overfitting and circular inference. Analyzing Pattern fMRI Signals 4
  • 5. Interpreting Accuracies • Underestimating Information • An absence of information at the level of fMRI does not mean that the local neural populations do not contain information • If neurons with different tuning properties were mixed randomly in a salt-and-pepper fashion, then no macroscopic information would be expected at the voxel level • A single neuron might contain substantial information that is drowned out by other neurons only contributing noise • Overestimating Information • There are several ways in which an observed accuracy with fMRI might overestimate the information • A voxel might sample a large blood vessel that drains a large population of neurons • = Aggregation of information that is not computationally used at the neural level. • The low sampling rate of fMRI signals and the sluggishness of the hemodynamic response • might temporally integrate information beyond the relevant timescales of neural signal processing 5
  • 6. • Comparing Different Brain Regions • Several factors that limit the comparison of accuracies between different brain regions • The size of regions • The sensitivity of fMRI to neural activity (the local hemodynamic response efficiency) • The signal-to-noise levels also generally differ between regions • Other Limitations • The obtained accuracy depends on the partitioning of data into training and test. • Less training data generally yield lower accuracies • Experimental design efficiency • Block based, trial based, or others? • The level of temporal aggregation • ISI? • Smoothing Interpreting Accuracies 6
  • 7. Circularity And Overfitting • Any dependencies are likely to cause false-positive classification of the test data even in the absence of information • Double dipping: leakage of information between training and test data • Overfitting phenomenon • Overfitting can occur if a too-complex classifier is fit to the training dataset that works well in the training data, but then fails to generalize to the test data. • Testing the generalizability of a classifier on independent test data thus protects against overfitting. • When different classifiers are tried out on the same data … • Overfitting only can be revealed by testing the accuracy on a further independent test dataset. • A nested cross-validation: split data to test / validation / training set • Another solution: decreases the number of free parameters 7
  • 8. Interpreting Classification Maps • Weight map • In a linear classifier such as LDA or SVM, the weight at each voxel directly reflects the contribution of that voxel to the classification result • But it does not permit a conclusion as to whether an individual voxel contributed significantly to the result • Test whether it makes a significant difference if the voxel is included in the classifier • A voxel might have a significant weight despite not having label-related information. • Searchlight analysis • depict the centers of informative voxel clusters, but not the informative voxels themselves. 8
  • 9. Controlling For Nuisance Variables • A more detailed control for confounding factors is also necessary. • Classifiers can extract information even if the sign of an effect randomly varies across subjects • Thus, more elaborate controls are needed to avoid that decoding results merely reflect nuisance variables, such as difficulty or attention • Solutions • Regress out the nuisance variable • Directly compare decoding for nuisance variables and for the cognitive factor of interest 9
  • 10. Extra. Information-based Approach • What's in a pattern? Examining the type of signal multivariate analysis uncovers at the group level • Gilron Roee et al., 2017, NeuroImage • 2nd level multivariate analysis often “information-based”, univariate “activation-based”. • Information-based: the sign of the effect of individual subjects is discarded and a non-directional summary statistic • Activation-based: both signal magnitude and sign are taken into account • Implicit paradigm shift in signal definition in univariate vs. multivariate analysis. • This paper… • shows that directional and non-directional group-level MVPA approaches uncover distinct brain regions with only partial overlap. • offers resolution by proposing multivariate activation based statistic. 10
  • 11. Summary • The approach has to be applied carefully in order to avoid overfitting of the large parameter spaces involved. • Caution also is required when interpreting the results of classification studies in terms of the information encoded in neural populations or in the tuning of single neurons. • (Use with the form of encoding models or combination with RSAs) 11
  • 12. The impact of study design on pattern estimation for single-trial multivariate pattern analysis Jeanette A. Mumford et al., 2014, NeuroImage 12
  • 13. Highlights • Assessment of Type I error in pattern similarity and classification analyses. • Type I errors of similarity analyses are notably affected by study design. • Classification analyses are more robust to study design choice. • The optimal design for pattern similarity is to use between-run-based patterns. • The optimal analysis strategy for classification is between-run cross validation. 13
  • 14. Least Squares All Model • All trials are estimated simultaneously in a single model, using a separate regressor consisting of an impulse (or boxcar) function convolved with a double gamma hemodynamic response function (HRF) • = beta-series regression • Pitfall: when trials have a short interstimulus interval (ISI), e.g., less than 3 s between the end of one stimulus and onset of the next stimulus, the regressors become highly correlated, or collinear, which inflates the variance of the resulting parameter estimates. 14
  • 15. Least Squares Single Model • The LSS model reduces collinearity by using a separate model for each trial, in which the first regressor models the trial of interest and the other two regressors model the remaining trials according to trial type • Only the first parameter estimate is retained in each model and estimates the activation for that individual trial • LSS has been shown to produce higher classification accuracies than LSA for short ISIs (3–5 s) 15
  • 16. Overview: Pattern Similarity (Discussion) • In within run setting, • Even large ISIs (15s) do not guarantee independence between the pattern estimates and this can drive false positive differences. • regardless of which of the three pattern estimators (LSS, LSA, Add6) are used… • Only way to preserve Type I error in within run: • randomly order the trials with a different randomization for each subject 16
  • 17. Overview: Pattern Classification (Discussion) • A Within-run CV • would be susceptible to a peeking bias • Especially the case for blocked and alternating trials when the ISI was only 3 second long • Between-run CV • is stable regardless of trial order and is the recommended approach. • Shorter ISI studies • LSS model & between-run CV is overall more advantageous • without any detriment to the Type I or II error rates! 17
  • 18. Derivations • BOLD time series Y • Trial-specific activations β • Vβ = true covariance between the trials • = true representational similarity covariance matrix • = the pattern similarity correlations can be derived • Combining (1) and (2), • The variance of Y
 
 18 Y = XLSAβ + ϵY, ϵY ∼ N(0,VY) (1) β = μ + ϵβ, ϵβ ∼ N(0,Vβ) (2) Y = XLSAμ + XLSAϵβ + ϵY (3) Var(Y) = XLSAVβX′LSA + Vy (4)
  • 19. Derivations • Pattern distribution: LSA • The trial-specific parameter estimates • The true similarity between the estimated patterns of all pairs of trials, derived from Eqs (4), (5) • In the special case where the BOLD time series are uncorrelated, Vy = σy2I, where σy2 is the variance and I is a Ntpts × Ntpts identity matrix, this estimated variance reduces to 
 
 
 19 (5)̂βLSA = (X′LSAXLSA)−1 X′LSAY (6) Var( ̂βLSA) = (X′LSAXLSA)−1 X′LSAVar(Y)XLSA(X′LSAXLSA)−1 = Vβ + (X′LSAXLSA)−1 X′LSAVyXLSA(X′LSAXLSA)−1 (7)Var( ̂βLSA) = Vβ + σ2 y (X′LSAXLSA)−1
  • 20. Derivations • Pattern distribution: LSS • The estimate for the first trial
 • where c is the row vector, [1, 0, +0] (regressors) • All LSS-based trial estimates can simultaneously be estimated using • where • Combining this with the variance of Y given in (4) yields • In the special case where Vy = σy2I, 
 20 ̂βLSSi,1 = c(X′LSSi XLSSi )−1 XLSSi Y (8) ̂βLSS = XLSSY (9) XLSS = c(X′LSS1 XLSS1 )−1 XLSS1 c(X′LSS2 XLSS2 )−1 XLSS2 . . . c(X′LSSNtrials XLSSNtrials )−1 XLSSNtrials (10) Var( ̂βLSS) = XLSSVar(Y)X′LSS = XLSSXLSAVβX′LSAX′LSS + XLSSVyX′LSS (11) Var( ̂βLSS) = XLSSXLSAVβX′LSAX′LSS + XLSSX′LSSσ2 y (12)
  • 21. Methods: Pattern Similarity (within) • Experiment design • With / without temporal autocorrelation: Eqs (6), (11) or Eqs (7), (12) • 2 trial types: t1, t2 • trial numbers: 22 or 42 • Different lengths of ISI: mean 3s (2~5s), 7s (6~9s), (+15s) • σy2 = 1 • 225 time points (TR = 2s) • Trial orderings: blocked, alternating(t1, t2, t1, t2, …), random order • Hypothesis • Whether within-trial-type similarities(wt1, wt2) differ from each other or from between-trial-type similarities(bt1t2) • Paired t-test: wt1-wt2, wt1-bt1t2, wt2-bt1t2 21
  • 22. Methods: Pattern Similarity (within) • Simulated data parameters • Temporal covariance estimates Vy are based on real data with 225 time points (TR = 2s) • estimated from 198 resting state data sets for the same ROI, a randomly chosen 7 × 7 × 7 voxel cube in standard MNI space (Right Putamen) • For each simulated subject, assuming that the true similarity was the identity Vβ • A design matrix was randomly generated • Eqs. (7) and (12) were used to compute the similarity matrices • 10,000 data sets of 30 subjects were randomly generated, • using a different set of ISIs and randomly ordered trials, when applicable, for each subject. • An additional simulation: pseudorandom order 22
  • 23. Methods: Pattern Similarity (between) • The Type I error rates when similarities were computed between-run • Eq. (2) was used to simulate activation magnitudes for 500 voxels • The values for β were used to simulate time series following Eq. (1) • The true similarity covariance, Vβ, was set to the identity matrix • The mean trial activation, µ, was set to a vector of zeros • Temporal covariances(VY) were derived from resting state data 23
  • 24. Methods: Classification (within & between) • Simulated data parameters • Mean ISI: 3s, 7s • Trial orders: blocked, alternating, random • Data for 1000 subjects were generated • Additional pseudorandom ordering test in within-run CV • based on a data set of 30 subjects • Classification options • SVM classifier (cost = 1) • 2-fold CV • (within, random split; between, grouped) • WR: within-run • BR(Same): between-run & same ISIs and stimulus order • BR(Diff): between-run & different ISIs and stimulus order 24
  • 25. Results: Pattern Similarity (within) • Fig 2. Impact of collinearity in LSS and LSA models on similarity estimates • Blocked design 25
  • 26. Results: Pattern Similarity (within) • Patterns of LSA • At lag 1, One parameter estimate will be elevated and, to preserve the model fit, the collinear counterparts' parameter estimates will be pushed in the opposite direction • At lag 2, the two trials will be pushed in the same direction by their common collinear neighbor, causing a positive correlation. • Patterns of LSS • Two blocks along the diagonal: • a weak collinearity occurs if the neighbors of a trial of interest are exemplars of the same category • With blocked trial order, almost all trials of t1 have t1 neighbors, • that results in negatively biased similarity estimates between same category • Strong positive correlations for early lags: • Result of each trial's estimate coming from an independent model • This weak effect will be shown to have a smaller, but opposite, impact on pattern similarities. 26
  • 27. • Fig 3. Distributions of paired similarity differences (30 subjects) Results: Pattern Similarity (within) 27 Eqs. (7), (12) Eqs. (6), (11)
  • 28. • Table 1. Type 1 error rates across simulations • Table 2. Pseudorandom Results: Pattern Similarity (within&between) 28
  • 29. • Fig 4. Classification accuracy distributions Results: Classification 29
  • 30. Results: Classification • Classification accuracy distributions in Pseudorandom 30
  • 31. Discussion • No observed benefit with increased ISI • Even with long ISIs there is an impact of trial order on pattern similarity estimates. • Effects driven by temporal autocorrelation occur because time points that are closer tend to be more highly correlated • In the case of the blocked trials… • most between-trial-type similarities are very far apart and the within-trial-type similarities are at smaller lags • This is why the wt1–bt1t2 and wt2–bt1t2 distributions tend to be significantly larger than 0 • Generally, at small ISIs the different pattern estimators suffer from collinearity, driven by positively correlated regressors in the models • When the ISI is increased this alleviates collinearity, yet there will always be a slight negative correlation between regressors… • weak effect, but is enough to drive biases 31
  • 32. Discussion • Benefits of between-run similarity analyses • The inflation of Type I error rates that arose in the within-run similarity analyses was driven by correlations, • either between covariates in the model used to estimate the patterns, or the temporal covariance • The time series from two different runs are completely independent from each other • The simulations used temporal covariance estimates from different subjects • in place of temporal covariance estimates from two runs of the same subject • Since temporal covariance for small differences in time seems to have the largest impact, it seems that two runs, which would typically have a couple of minutes between them, would not be problematic… 32
  • 33. Discussion • Why and when within-run CV fails • Within-run cross validations are especially problematic for the blocked trial order and somewhat for alternating trial orders • Randomly ordered trials seem to perform fine within the within-run CV setting • The reason the results vary according to study design is due to different levels of peeking bias • In the alternating case, the trials of the same class are always separated by at least 1 other trial, so the relationship is weaker 33
  • 34. Discussion • Add6 model • Intuitively, with the Add6 approach, one might think that there will not be a model-based effect, since a model is not necessary to extract the patterns • However, the results of the Add6 model are very similar to LSA at a long ISI of 15 s 34
  • 35. Discussion • Impact on future study designs • It seems that using multiple, shorter runs would be more advantageous than using longer runs (with between-run analysis) • When want to chapters a different level of learning, • This would likely require both shorter runs and tasks where learning occurs slow enough that ceiling is not immediately reached 35