You Thought What?! The Promise of Real-Time Brain Decoding: What can faster machine learning and new model-based approaches tell us about what someone is really thinking? Recently, Intel joined up with some of the pioneers of brain decoding to understand exactly that. Using functional MRI as our microscope, we began analyzing large amounts of high-dimensional 4-D image data to uncover brain networks that support cognitive processes. But existing image preprocessing, feature selection, and classification techniques are too slow and inaccurate to facilitate the most exciting breakthroughs. In this talk, we’ll discuss the promise of accurate real-time brain decoding and the computational headwinds. And we’ll look at some of the approaches to algorithms and optimization that Intel Labs and its partners are taking to reduce the barriers.
5. 5
The brain: The black box at the end of our necks
• Facts:
Only 2% of body weight but
uses up to 20% of energy
~200B neurons
Neurons fire up to ~10 kHz
1K to 10K connections per
neuron
• The cerebral neocortex (the
“mammalian brain” associated
with higher reasoning):
~20B neurons
~125 trillion synapses
There are more ways to organize the neocortex’s ~125 trillion synapses
than stars in the known universe.
6. stimulus
(task)
mind brain dataset?
what is present in
the mind as the task
is performed?
AdaptedfromFranciscoPereira,BotvinickLab,Princeton
computational
model?
what is attended to in
the mind as the task is
performed?
6
7. Non-invasive neuroimaging
7
Electrical phenomena Metabolic phenomena
Positron
Emission
Tomography
Functional
Magnetic
Resonance
Imaging (fMRI)
Magneto-
Encephalography
(MEG)
Consumer EEG
(<sensors)
Near-Infrared
Spectroscopy
(fNIRs)
Betterspatial
resolution
Lab/Medical
EEG (>sensors)
Varying portability, temporal & spatial resolution. fMRI is the workhorse of
brain research despite disadvantages of non-portability & expense
8. Real-Time Functional MRI (rtfMRI)
8
metabolic brain
anatomical brain
Adapted from graphic by JeremyManning,Princeton
9. stimulus
(task)
mind brain rtfMRI
classifier
conclusions from
structure of the
learnt model
conclusions from
feature choice
weights on features
hidden layers
voxel location
voxel behavior
time within trial
dependent on
prediction model
dependent on
experiment
AdaptedfromFranciscoPereira,BotvinickLab,Princeton 9
10. Studying attention | dueling categories
%BOLDchange
Time
Face attention
Scene (place)
attention
Fusiformface
area(FFA)
Parahippocampal
placearea(PPA)
e.g.,O’Cravenetal.,1999,Nature
10
13. 13
Standard types of fMRI analysis. (A) Univariate activation refers to the average
amplitude of BOLD activity evoked by events of an experimental condition.
N B Turk-Browne Science 2013;342:580-584
*BOLD: blood oxygenation level–dependent (BOLD) contrast imaging
14. 14
Standard types of fMRI analysis. (A) Univariate activation refers to the average
amplitude of BOLD activity evoked by events of an experimental condition.
N B Turk-Browne Science 2013;342:580-584
*MVPA: Multivariate Pattern Analysis
*FCMA: Full Correlation Matrix Analysis,
Advanced Analysis
MVPA FCMA
Basic (i.e. common) Analysis
15. Offline fMRI image analysis experiment
data acquisition preprocessing
classifier testinganalyze results
Processing time …
6 to 55 hours
voxel analysis
15CourtesyofNickTurk-Browne,Princeton
17. Studying attention | real-time neurofeedback
Attendtoscene
MORE
sceneevidence
LESS
sceneevidence
Rewarded with stronger
stimulus and easier task
Punished with degraded
stimulus and harder task
Starting
stimulus
17CourtesyofNickTurk-Browne,Princeton
18. data acquisition real-time preprocessing
classifier testingupdate stimulus display
Processing time …
6 to 55 hours
real-time voxel analysis
Closed-loop rtfMRI neurofeedback system
18
19. Studying attention | training and scoring
Neurofeedback
Use multivariate pattern analysis (MVPA) over whole-brain
activity to decode attention to faces vs. scenes
Mean cross-validation accuracy = 78% ***
Norman etal.(2006),LaConte (2011)Regularizedlogistic regression (penalty=1),***p<0.001
19
22. 22
This was done with MVPA. We’d also like to try
FCMA to include connectivity information, but...
23. A Big Data/HPC challenge
Some facts:
To keep up with the rtfMRI scanner, must
process full brain scan and provide feedback in
<1sec (for a 2sec TR)
Raw image data for 1 subject, ~480 Gbytes
Some studies train on 100’s of subjects
If we run correlations across all subjects
involves a lot of data movement
Processing is expensive:
N~100K voxels per time slice
O(N2) for basic preprocessing (minutes today)
O(N3) to process the full correlation matrix
(hours today)
Raw fMRI
Data
Patterns of
correlated
voxels
Image Sources: Princeton Neuroscience Institute and Wikipedia
“Train classifier on 100’s of subjects during coffee break,
classify a subject’s patterns in <1sec.”
23
24. Machine Learning Workload Convergence
24
Education
Health
Banking
Manufacturing
Usages Workloads
Machine
Learning
Algorithms
High-level
Libraries
Primitives
Low-level
Libraries
Hardware
Platforms
Xeon
Xeon Phi
Xeon FPGA
Xeon Gfx
Add-in card
New ISATransportation
Building Blocks
Intel can help accelerate a wide range of machine learning
through a focus on key building blocks.
25. 25
Intel® Math Kernel Library (Intel® MKL)
Random Number Gen.
• Congruential
• Wichmann-Hill
• Mersenne Twister
• Sobol
• Neiderreiter
• Non-deterministic
Summary Statistics
• Kurtosis
• Variation coefficient
• Quantiles
• Order statistics
• Min/max
• Variance-covariance
Data Fitting
• Spline-based
• Interpolation
• Cell search
Linear Algebra
• BLAS, Sparse BLAS
• LAPACK solvers
• Sparse Solvers (DSS,
PARADISO)
• Iterative solver (RCI)
• ScaLAPACK, PBLAS
Fast Fourier Transforms
• Multidimensional
• FFTW interfaces
• Cluster FFT
• Trig. Transforms
• Poisson solver
• Convolution via VSL
Vector Math
• Trigonometric
• Hyperbolic
• Exponential, Logarithmic
• Power / Root
26. Unveiling Details of Knights Landing
(Next Generation Intel® Xeon Phi™ Products)
2nd half ’15
1st commercial systems
3+ TFLOPS
In One Package
Parallel Performance & Density
On-Package Memory:
up to 16GB at launch
5X Bandwidth vs DDR4
Compute: Energy-efficient IA cores
Microarchitecture enhanced for HPC
3X Single Thread Performance vs Knights Corner
Intel Xeon Processor Binary Compatible
1/3X the Space
5X Power Efficiency
.
.
.
.
.
.
Integrated Fabric
Intel® Silvermont Arch.
Enhanced for HPC
Processor Package
Conceptual—Not Actual Package Layout
…
Platform Memory: DDR4 Bandwidth and
Capacity Comparable to Intel® Xeon® Processors
Jointly Developed with Micron Technology
26
27. FCMA Correlation Computation
27
voxels
voxels
scan
data
scan
data
Correlations
Need Pearson’s correlation coefficient for
each pair of voxels
34,470 voxels => over 500 million pairs
Functionality provided by Intel’s libraries
If scan data is normalized (mean-centered
and unit norm) then Pearson correlation
becomes matrix multiplication
Can use single-precision general matrix
multiplication (SGEMM) built into Intel
Math Kernel Library (MKL)
Current work is to improve SGEMM
performance when computing with small
numbers of scans (e.g. 12)
ThankstoMikeAnderson,IntelLabs
28. FCMA Z-Score Computation
28
Correlations
Need to complete Z-score procedure across all
correlation matrices produced by a single subject
Fisher transformation of each correlation
coefficient => 0.5* ln((1+x)/(1-x))
Then , at each location in correlation matrix,
subtract mean and divide by standard deviation
across all correlation matrices
Acceleration using Single Instruction Multiple
Data (SIMD) instructions
Correlation coefficients are grouped into
contiguous vectors and processed using SIMD
instructions to exploit data parallelism
Loop annotated with #pragma simd
Natural logarithm can also be vectorised using Intel
Short Vector Math Library (SVML) to accelerate
Fisher transformation
voxels
voxels
ThankstoMikeAnderson,IntelLabs
29. Putting it all together: FCMA Z-score example
29
#pragma omp parallel for
for(int v = 0 ; v < step*nSubs ; v++)
{
int s = v % nSubs; // subject id
int i = v / nSubs; // voxel id
float (*mat)[row] = (float(*)[row])&(voxels->corr_vecs[i*nTrials*row]);
#pragma simd
for(int j = 0 ; j < row ; j++)
{
float mean = 0.0f;
float std_dev = 0.0f;
for(int b = s*nPerSub; b < (s+1)*nPerSub; b++)
{
_mm_prefetch((char*)&(mat[b][j+32]), _MM_HINT_ET1);
_mm_prefetch((char*)&(mat[b][j+16]), _MM_HINT_T0);
float num = 1.0f + mat[b][j];
float den = 1.0f - mat[b][j];
num = (num <= 0.0f) ? 1e-4 : num;
den = (den <= 0.0f) ? 1e-4 : den;
mat[b][j] = 0.5f * logf(num/den);
mean += mat[b][j];
std_dev += mat[b][j] * mat[b][j];
}
mean = mean / (float)nPerSub;
std_dev = std_dev / (float)nPerSub - mean*mean;
float inv_std_dev = (std_dev <= 0.0f) ? 0.0f : 1.0f / sqrt(std_dev);
for(int b = s*nPerSub; b < (s+1)*nPerSub; b++)
{
mat[b][j] = (mat[b][j] - mean) * inv_std_dev;
}
}
}
}
Several MPI processes running the above code
OpenMP divides independent voxels (dim1) and subjects across 60 Xeon Phi Cores
#pragma simd directive assigns consecutive voxels (dim2) to vector lanes
voxels
voxels
ThankstoMikeAnderson,IntelLabs
30. FCMA SVM
30
Correlationwithvoxelvi Subjects, trials
Key is to find the most predictive voxels in the
correlation matrix
• Rows of the correlation matrix are the feature
vectors
Very large number of SVMs are trained
• One for each voxel - O(35000)
• Each trained SVM is cross validated and the top
few voxels are chosen for predictive analyses
Acceleration using custom SVM code
• Kernel matrix precomputed as #dimensions <<
#data points
• Ported parallel GPUSVM code to run on Xeon and
Xeon Phi platforms
• Uses thread level and SIMD parallelism
• Faster than libSVM
ThankstoNarayananSundaram,IntelLabs
31. FCMA – Effect of Optimizations
31
0
1
2
3
4
5
6
7
Correlation
Z-score
SVM
Total
Correlation
Z-score
SVM
Total
Xeon Xeon Phi
Runtimeinseconds(for17subjects)
Before optimizations
After optimizations
1.7X speedup on Xeon
5.8X speedup on Xeon Phi
Xeon Phi 2.1X faster than Xeon
ThankstoYidaWang,Princeton,andNarayananSundaram
33. 33
stimulus
(task)
mind brain rtfMRI
classifier
conclusions from
structure of the
learnt model
conclusions from
feature choice
weights on features
hidden layers
voxel location
voxel behavior
time within trial
dependent on
prediction model
dependent on
experiment
Adapted from Francisco Pereira, Botvinick Lab, Princeton
37. 37
Modeling | Topographic Factor Analysis
Manning JR, Ranganath R, Norman KA, Blei DM (2014) Topographic Factor Analysis: A Bayesian Model for Inferring Brain
Networks from Neural Data. PLoS ONE 9(5): e94914. doi:10.1371/journal.pone.0094914
http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0094914
38. 38
Modeling | Topographic Factor Analysis
Manning JR, Ranganath R, Norman KA, Blei DM (2014) Topographic Factor Analysis: A Bayesian Model for Inferring Brain
Networks from Neural Data. PLoS ONE 9(5): e94914. doi:10.1371/journal.pone.0094914
http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0094914
39. 39
Modeling | Topographic Factor Analysis
Manning JR, Ranganath R, Norman KA, Blei DM (2014) Topographic Factor Analysis: A Bayesian Model for Inferring Brain
Networks from Neural Data. PLoS ONE 9(5): e94914. doi:10.1371/journal.pone.0094914
http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0094914
N trials
V voxels
voxel activations y
K shared sources (µ, )
weights w
40. 40
Modeling | Topographic Factor Analysis
Manning JR, Ranganath R, Norman KA, Blei DM (2014) Topographic Factor Analysis: A Bayesian Model for Inferring Brain
Networks from Neural Data. PLoS ONE 9(5): e94914. doi:10.1371/journal.pone.0094914
http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0094914
number of sources?
specification of sources?
hyperparameter values?
initialization of sources?
41. 41
Modeling | Topographic Factor Analysis
Manning JR, Ranganath R, Norman KA, Blei DM (2014) Topographic Factor Analysis: A Bayesian Model for Inferring Brain
Networks from Neural Data. PLoS ONE 9(5): e94914. doi:10.1371/journal.pone.0094914
http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0094914
“mental state” mn during nth
trial gives rise to behavioral
data bn and neural data yn
42. 42
... is a work in progress....
more basic neuroscience
research
more machine learning
speed and accuracy
a look at other model-
based methods
Decoding your thoughts...
43. 43
Conclusions
Closed-loop rtfMRI amplifies and
externalizes internal states that are difficult
to access
Holds promise for people that suffer from
mental disorders or simply want to improve
brain performance
Intel is helping put the rt into rtfMRI and
unlock the potential of this research
44. Thanks Princeton Neuroscience Institute!
Jon Cohen — PNI Co-Founder, Professor of Neuroscience and Psychology
Matt Botvinick — Professor of Neuroscience and Psychology
Ken Norman — Professor of Neuroscience and Psychology
Nick Turk-Browne — Professor of Neuroscience and Psychology
Kai Li — Professor of Computer Science and Co-Founder of Data Domain Corporation
44