Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC

You thought what?! The promise of real-time
brain decoding
Ted Willke
Intel Labs

2
Alvarez&Oliva,2006
BUILDINGS PEOPLE

What is attention?
“Everyoneknowswhatattentionis. Itisthetaking
possessionbythemind,inclearandvividform,of
oneoutofwhatseemseveralsimultaneously
possibleobjectsortrainsofthought... Itimplies
withdrawalfromsomethingsinordertodeal
effectivelywithothers...”
–WilliamJames(1890)
Asimplebutimportantdistinction:
• Overtattention:movingyoureyes
• Covertattention:movingyourmind’seye
CourtesyofNickTurk-Browne,Princeton 3

The great controller
Perception
MemoryLearning
Attention 4CourtesyofNickTurk-Browne,Princeton
Perception

5
The brain: The black box at the end of our necks
• Facts:
 Only 2% of body weight but
uses up to 20% of energy
 ~200B neurons
 Neurons fire up to ~10 kHz
 1K to 10K connections per
neuron
• The cerebral neocortex (the
“mammalian brain” associated
with higher reasoning):
 ~20B neurons
 ~125 trillion synapses
There are more ways to organize the neocortex’s ~125 trillion synapses
than stars in the known universe.

stimulus
(task)
mind brain dataset?
what is present in
the mind as the task
is performed?
AdaptedfromFranciscoPereira,BotvinickLab,Princeton
computational
model?
what is attended to in
the mind as the task is
performed?
6

Non-invasive neuroimaging
7
Electrical phenomena Metabolic phenomena
Positron
Emission
Tomography
Functional
Magnetic
Resonance
Imaging (fMRI)
Magneto-
Encephalography
(MEG)
Consumer EEG
(<sensors)
Near-Infrared
Spectroscopy
(fNIRs)
Betterspatial
resolution
Lab/Medical
EEG (>sensors)
Varying portability, temporal & spatial resolution. fMRI is the workhorse of
brain research despite disadvantages of non-portability & expense

Real-Time Functional MRI (rtfMRI)
8
metabolic brain
anatomical brain
Adapted from graphic by JeremyManning,Princeton

stimulus
(task)
mind brain rtfMRI
classifier
conclusions from
structure of the
learnt model
conclusions from
feature choice
 weights on features
 hidden layers
 voxel location
 voxel behavior
 time within trial
dependent on
prediction model
dependent on
experiment
AdaptedfromFranciscoPereira,BotvinickLab,Princeton 9

Studying attention | dueling categories
%BOLDchange
Time
Face attention
Scene (place)
attention
Fusiformface
area(FFA)
Parahippocampal
placearea(PPA)
e.g.,O’Cravenetal.,1999,Nature
10

Studying attention | coupling hypothesis
Occipital cortex Ventral temporal cortex
V4
FFA
PPA
r
Al-Aidroosetal.,2012,ProcNatlAcadSci 11

Studying attention | coupling hypothesis
Al-Aidroosetal.,2012,ProcNatlAcadSci
Faceattention
Sceneattention
N = 7, *p < .05,**p < .01
12

13
Standard types of fMRI analysis. (A) Univariate activation refers to the average
amplitude of BOLD activity evoked by events of an experimental condition.
N B Turk-Browne Science 2013;342:580-584
*BOLD: blood oxygenation level–dependent (BOLD) contrast imaging

14
Standard types of fMRI analysis. (A) Univariate activation refers to the average
amplitude of BOLD activity evoked by events of an experimental condition.
N B Turk-Browne Science 2013;342:580-584
*MVPA: Multivariate Pattern Analysis
*FCMA: Full Correlation Matrix Analysis,
Advanced Analysis
MVPA FCMA
Basic (i.e. common) Analysis

Offline fMRI image analysis experiment
data acquisition preprocessing
classifier testinganalyze results
Processing time …
6 to 55 hours
voxel analysis
15CourtesyofNickTurk-Browne,Princeton

16
real-time brain decoding for
causal experimentation

Studying attention | real-time neurofeedback
Attendtoscene
MORE
sceneevidence
LESS
sceneevidence
Rewarded with stronger
stimulus and easier task
Punished with degraded
stimulus and harder task
Starting
stimulus
17CourtesyofNickTurk-Browne,Princeton

data acquisition real-time preprocessing
classifier testingupdate stimulus display
Processing time …
6 to 55 hours
real-time voxel analysis
Closed-loop rtfMRI neurofeedback system
18

Studying attention | training and scoring
Neurofeedback
Use multivariate pattern analysis (MVPA) over whole-brain
activity to decode attention to faces vs. scenes
Mean cross-validation accuracy = 78% ***
Norman etal.(2006),LaConte (2011)Regularizedlogistic regression (penalty=1),***p<0.001
19

Scoring sequence – your brain on scenes?
21

22
This was done with MVPA. We’d also like to try
FCMA to include connectivity information, but...

A Big Data/HPC challenge
Some facts:
 To keep up with the rtfMRI scanner, must
process full brain scan and provide feedback in
<1sec (for a 2sec TR)
 Raw image data for 1 subject, ~480 Gbytes
 Some studies train on 100’s of subjects
 If we run correlations across all subjects
involves a lot of data movement
 Processing is expensive:
 N~100K voxels per time slice
 O(N2) for basic preprocessing (minutes today)
 O(N3) to process the full correlation matrix
(hours today)
Raw fMRI
Data
Patterns of
correlated
voxels
Image Sources: Princeton Neuroscience Institute and Wikipedia
“Train classifier on 100’s of subjects during coffee break,
classify a subject’s patterns in <1sec.”
23

Machine Learning Workload Convergence
24
Education
Health
Banking
Manufacturing
Usages Workloads
Machine
Learning
Algorithms
High-level
Libraries
Primitives
Low-level
Libraries
Hardware
Platforms
Xeon
Xeon Phi
Xeon FPGA
Xeon Gfx
Add-in card
New ISATransportation
Building Blocks
Intel can help accelerate a wide range of machine learning
through a focus on key building blocks.

25
Intel® Math Kernel Library (Intel® MKL)
Random Number Gen.
• Congruential
• Wichmann-Hill
• Mersenne Twister
• Sobol
• Neiderreiter
• Non-deterministic
Summary Statistics
• Kurtosis
• Variation coefficient
• Quantiles
• Order statistics
• Min/max
• Variance-covariance
Data Fitting
• Spline-based
• Interpolation
• Cell search
Linear Algebra
• BLAS, Sparse BLAS
• LAPACK solvers
• Sparse Solvers (DSS,
PARADISO)
• Iterative solver (RCI)
• ScaLAPACK, PBLAS
Fast Fourier Transforms
• Multidimensional
• FFTW interfaces
• Cluster FFT
• Trig. Transforms
• Poisson solver
• Convolution via VSL
Vector Math
• Trigonometric
• Hyperbolic
• Exponential, Logarithmic
• Power / Root

Unveiling Details of Knights Landing
(Next Generation Intel® Xeon Phi™ Products)
2nd half ’15
1st commercial systems
3+ TFLOPS
In One Package
Parallel Performance & Density
On-Package Memory:
 up to 16GB at launch
 5X Bandwidth vs DDR4
Compute: Energy-efficient IA cores
 Microarchitecture enhanced for HPC
 3X Single Thread Performance vs Knights Corner
 Intel Xeon Processor Binary Compatible
 1/3X the Space
 5X Power Efficiency
.
.
.
.
.
.
Integrated Fabric
Intel® Silvermont Arch.
Enhanced for HPC
Processor Package
Conceptual—Not Actual Package Layout
…
Platform Memory: DDR4 Bandwidth and
Capacity Comparable to Intel® Xeon® Processors
Jointly Developed with Micron Technology
26

FCMA Correlation Computation
27
voxels
voxels
scan
data
scan
data
Correlations
Need Pearson’s correlation coefficient for
each pair of voxels
 34,470 voxels => over 500 million pairs
Functionality provided by Intel’s libraries
 If scan data is normalized (mean-centered
and unit norm) then Pearson correlation
becomes matrix multiplication
 Can use single-precision general matrix
multiplication (SGEMM) built into Intel
Math Kernel Library (MKL)
 Current work is to improve SGEMM
performance when computing with small
numbers of scans (e.g. 12)
ThankstoMikeAnderson,IntelLabs

FCMA Z-Score Computation
28
Correlations
Need to complete Z-score procedure across all
correlation matrices produced by a single subject
 Fisher transformation of each correlation
coefficient => 0.5* ln((1+x)/(1-x))
 Then , at each location in correlation matrix,
subtract mean and divide by standard deviation
across all correlation matrices
Acceleration using Single Instruction Multiple
Data (SIMD) instructions
 Correlation coefficients are grouped into
contiguous vectors and processed using SIMD
instructions to exploit data parallelism
 Loop annotated with #pragma simd
 Natural logarithm can also be vectorised using Intel
Short Vector Math Library (SVML) to accelerate
Fisher transformation
voxels
voxels

Putting it all together: FCMA Z-score example
29
#pragma omp parallel for
for(int v = 0 ; v < step*nSubs ; v++)
{
int s = v % nSubs; // subject id
int i = v / nSubs; // voxel id
float (*mat)[row] = (float(*)[row])&(voxels->corr_vecs[i*nTrials*row]);
#pragma simd
for(int j = 0 ; j < row ; j++)
{
float mean = 0.0f;
float std_dev = 0.0f;
for(int b = s*nPerSub; b < (s+1)*nPerSub; b++)
{
_mm_prefetch((char*)&(mat[b][j+32]), _MM_HINT_ET1);
_mm_prefetch((char*)&(mat[b][j+16]), _MM_HINT_T0);
float num = 1.0f + mat[b][j];
float den = 1.0f - mat[b][j];
num = (num <= 0.0f) ? 1e-4 : num;
den = (den <= 0.0f) ? 1e-4 : den;
mat[b][j] = 0.5f * logf(num/den);
mean += mat[b][j];
std_dev += mat[b][j] * mat[b][j];
}
mean = mean / (float)nPerSub;
std_dev = std_dev / (float)nPerSub - mean*mean;
float inv_std_dev = (std_dev <= 0.0f) ? 0.0f : 1.0f / sqrt(std_dev);
for(int b = s*nPerSub; b < (s+1)*nPerSub; b++)
{
mat[b][j] = (mat[b][j] - mean) * inv_std_dev;
}
}
}
}
 Several MPI processes running the above code
 OpenMP divides independent voxels (dim1) and subjects across 60 Xeon Phi Cores
 #pragma simd directive assigns consecutive voxels (dim2) to vector lanes
voxels
voxels

FCMA SVM
30
Correlationwithvoxelvi Subjects, trials
Key is to find the most predictive voxels in the
correlation matrix
• Rows of the correlation matrix are the feature
vectors
Very large number of SVMs are trained
• One for each voxel - O(35000)
• Each trained SVM is cross validated and the top
few voxels are chosen for predictive analyses
Acceleration using custom SVM code
• Kernel matrix precomputed as #dimensions <<
#data points
• Ported parallel GPUSVM code to run on Xeon and
Xeon Phi platforms
• Uses thread level and SIMD parallelism
• Faster than libSVM
ThankstoNarayananSundaram,IntelLabs

FCMA – Effect of Optimizations
31
0
1
2
3
4
5
6
7
Correlation
Z-score
SVM
Total
Correlation
Z-score
SVM
Total
Xeon Xeon Phi
Runtimeinseconds(for17subjects)
Before optimizations
After optimizations
1.7X speedup on Xeon
5.8X speedup on Xeon Phi
Xeon Phi 2.1X faster than Xeon
ThankstoYidaWang,Princeton,andNarayananSundaram

33
stimulus
(task)
mind brain rtfMRI
classifier
conclusions from
structure of the
learnt model
conclusions from
feature choice
 weights on features
 hidden layers
 voxel location
 voxel behavior
 time within trial
dependent on
prediction model
dependent on
experiment
Adapted from Francisco Pereira, Botvinick Lab, Princeton

34
stimulus
(task)
mind brain rtfMRI
classifier

35
stimulus
(task)
mind brain rtfMRI
model
predicted
stimulus
or task

36
stimulus
(task)
mind brain rtfMRI
model
predicted
rtfMRI
data

37
Modeling | Topographic Factor Analysis
Manning JR, Ranganath R, Norman KA, Blei DM (2014) Topographic Factor Analysis: A Bayesian Model for Inferring Brain
Networks from Neural Data. PLoS ONE 9(5): e94914. doi:10.1371/journal.pone.0094914
http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0094914

38

39
 N trials
 V voxels
 voxel activations y
 K shared sources (µ, )
 weights w

40
 number of sources?
 specification of sources?
 hyperparameter values?
 initialization of sources?

41
“mental state” mn during nth
trial gives rise to behavioral
data bn and neural data yn

42
... is a work in progress....
 more basic neuroscience
research
 more machine learning
speed and accuracy
 a look at other model-
based methods
Decoding your thoughts...

43
Conclusions
 Closed-loop rtfMRI amplifies and
externalizes internal states that are difficult
to access
 Holds promise for people that suffer from
mental disorders or simply want to improve
brain performance
 Intel is helping put the rt into rtfMRI and
unlock the potential of this research

Thanks Princeton Neuroscience Institute!
Jon Cohen — PNI Co-Founder, Professor of Neuroscience and Psychology
Matt Botvinick — Professor of Neuroscience and Psychology
Ken Norman — Professor of Neuroscience and Psychology
Nick Turk-Browne — Professor of Neuroscience and Psychology
Kai Li — Professor of Computer Science and Co-Founder of Data Domain Corporation
44

Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC

Similar to Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC (20)

More from MLconf

More from MLconf (20)

Recently uploaded

Recently uploaded (20)

Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC