Similar to PMED Transition Workshop - Machine Learning Methods to Learn Improved Electrophysiological Biomarkers in Clinical Trials - David Carlson, May 21, 2019
Similar to PMED Transition Workshop - Machine Learning Methods to Learn Improved Electrophysiological Biomarkers in Clinical Trials - David Carlson, May 21, 2019 (20)
PMED Transition Workshop - Machine Learning Methods to Learn Improved Electrophysiological Biomarkers in Clinical Trials - David Carlson, May 21, 2019
1. Machine Learning Methods to Learn
Improved EEG Biomarkers in Clinical Trials
David Carlson
Assistant Professor
Dept. of Civil & Environmental Engineering
Dept. of Biostatistics and Bioinformatics
Duke Clinical Research Institute
2. Learning Interpretable Neural Biomarkers for
Clinical Conditions or Outcomes
• Neural activity is frequently used to try to understand the basis of
neuropsychiatric disorders and effects of treatment
• Neural activity is complex
• Can we use machine learning to break the complex signals into interpretable
patterns?
• Are these signals related to susceptibility or treatment outcomes?
• Can they be used to develop novel treatments?
h), it is easier to classify the subject ID than the stage. This
se us two problems:
Does the learned model overfit the data?
Can the learned model be generalized to new patients?
EEG
EEG
EEG
EEG
Train
Test
CeA
10
20
30
40
50
IL_Cx
10
20
30
40
50
PrL_Cx
10
20
30
40
50
NAc
10
20
30
40
50
VSub
10
20
30
40
50
BLA
10
20
30
40
50
CeA
10
20
30
40
50
IL_Cx
10
20
30
40
50
PrL_Cx
10
20
30
40
50
NAc
10
20
30
40
50
VSub
10
20
30
40
50
VTA
10
20
30
40
50
BLA
BLA
CeA
CeA
IL_Cx
IL_Cx
NAc
NAc
PrL_Cx
PrL_Cx
VSub
VSub
VTA
VTA
0
6
6
0
2 to 8Hz
Non-Stressed Resilient Susceptible
HC FIT-Empty FIT-CD1 HC FIT-Empty FIT-CD1
0
0.05
0.1
0.15
0.2
0.25
NetworkScore
Pre-Chronic Stress Post-Chronic Stress
BLA
BLA
CeA
CeA
IL_Cx
IL_Cx
NAc
NAc
PrL_Cx
PrL_Cx
VSub
VSub
VTA
VTA
0
6
6
0
BLA
10
20
30
40
50
CeA10
20
30
40
50
IL_Cx
10
20
30
40
50
PrL_Cx
10
20
30
40
50
NAc
10
20
30
40
50
VSub
10
20
30
40
50
VTA
10
20
30
40
50
8-20Hz
0
0.01
0.02
0.03
0.04
Non-Stressed Resilient Susceptible
HC FIT-Empty FIT-CD1 HC FIT-Empty FIT-CD1
NetworkScore
Pre-Chronic Stress Post-Chronic Stress
0.05
RegionsRegionsRegionsRegions
BLA
BLA
CeA
CeA
IL_Cx
IL_Cx
NAc
NAc
PrL_Cx
PrL_Cx
VSub
VSub
VTA
VTA
0
6
6
0
2-6Hz
0
0.24
0.48
0.72
HC FIT-Empty FIT-CD1 HC FIT-Empty FIT-CD1
NetworkScore
Pre-Chronic Stress Post-Chronic Stress
Network
RegionsRegions
Network3Network4
12 to 20Hz
0
0.3
10 15 20 25 30 35 40 45 500 5
0
0.3
10 15 20 25 30 35 40 45 500 5
VSub
VSub
Frequency
Frequency
SpectralDensity
Frequency
SpectralDensity
Animal Studies Clinical Trials
3. Can machine learning help understand neural
circuits?
• Ongoing collaborations with Laboratory for
Psychiatric Neuroengineering led by Kafui
Dzirasa
• Trying to understand how the brain acts and
changes in animal models of neuropsychiatric
disorders
• Especially interested in early biomarkers:
• E.G., susceptibility for depression prior to any
behavioral signs
• Can we find these types of information on
neural signals alone?
Kafui Dzirasa
4. Depression (MDD)
(1) depressed mood
(2) diminished interest
(3) increase or decrease in appetite
(4) hypersomnia or insomnia
(5) psychomotor agitation or retardation
(6) fatigue or loss of energy
(7) feelings of worthlessness or excessive or inappropriate guilt
(8) diminished ability to think or concentrate
(9) recurrent thoughts of death
Most debilitating illness in the world (WHO, 2017)
5. Normal Function
Is MDD prevention a viable therapeutic
strategy??
AGING
Digoxin
Pacemaker
Heart Attack/Heart Failure
Blood Pressure/Cholesterol
6. Normal Function (Resilience)
Is MDD prevention a viable therapeutic
strategy??
Severe Stress
Antidepressants
MDD (Susceptibility)
Vulnerability
Can neural biomarkers signal vulnerability?
11. Describing neural activity as an electrical
connectome (Electome)
Interpretable patterns of neural activity
12. Break observed signals into electomes
• Each latent function is represented by one of our networks,
and the scores change how much each network is expressed.
Gallagher et al. NIPS 2017
13. Case Study: Mouse Model of Depression
• 44 mice used in a behavior study LFPs recorded during:
• Home cage
• Novel environment (FIT-Empty)
• Forced interaction test (FIT-CD1)
• Non-control mice go through a chronic stress paradigm
• Learn 20 features (electomes)
• Use features to predict:
• Behavioral condition
• Pre- and post-stress conditions
• Stress resiliency (i.e. is mouse depressed after chronic stress)
17. Does this pattern hold up?
• Molecular model of vulnerability
• Sidekick 1
• Physiological model of vulnerability
• Interferon alpha
• Behavioral model of vulnerability
• Childhood Trauma
• Genetic Risk Factor
• The learned electomes repeatedly hold up across many different type
of vulnerability signals
Hultman et al. Cell 2018
18. Can we adapt to clinical trials and
studies on humans?
19. Large-Scale Studies and Trials
• Can we talk these same ideas and make
them work on human data?
• Can we help improve understanding of
the brain and responses in clinical
interventions?
• Large scale data is being built: NIH
recently awarded a Autism Center of
Excellence to support large scale studies
and clinical trials in Autism Spectrum
Disorder (ASD) and Attention Deficit
Hyperactivity Disorder (ADHD)
Geri Dawson
20. Machine-learning approaches to EEG-based
biomarkers in ASD clinical trials
• Utilized data from an open label trial of 25 2-7 year old children who received a
single infusion of autologous umbilical cord blood. Behavioral, EEG, and MRI data
collected at baseline, 6 months, and 12 months post infusion.
• Analyzed this initial cohort of 25 children to try to answer two questions:
• Can we track EEG changes that happen after treatment?
• Can we predict who will/will not show changes after treatment?
• First developed an interpretable machine learning framework
• Then built a computational framework to address “little big data”
• All developed frameworks are going to be used to help analyze much larger
closed-label trials that just finished or are underway (Duke Autism Center of
Excellence)
22. Transparent Machine Learning for EEG
• We have developed a custom
Convolutional Neural Network (CNN)
approach for electroencephalography
(EEG) data, SyncNet
• Learns a mapping to a pseudo-input
space; able to incorporate studies with
differing electrode layouts into the
same analysis (GP Adapter)
• Is this CNN interpretable?
Li et al, NeurIPS 2017
23. Shallow Convolutional Neural Network
1st Filter Bank
1D Convolutional Filters
2nd Filter Bank
…
Max Pooling
1st Feature
Extracted (Scalar) Features
2nd Feature
3rd Filter Bank 3rd Feature
𝑠* ∈ ℝ-
K different filter banks
…
…
Only a single convolutional layer
Can view this as a nonlinear feature extraction step.
24. Viewing our CNN as a
Nonlinear Feature Extraction
Nonlinear
Feature
Extraction
Logistic
Regression
Low-Dimensional
Extracted Feature
Can interpret logistic
regression weights.
Can we understand and interpret the
extracted features?
25. Parameterized Convolutional Filters
• SyncNet uses parameterized convolutional filters based on Morlet
wavelets
𝑓/
(1)
𝜏 = 𝑏/
1
𝑐𝑜𝑠 𝜔 1
𝜏 + 𝜙/
1
𝑒'< = >?
• 𝑐, 𝑘 are channel (which electrode) and filter index, respectively
• 𝜔 1 and 𝛽 1 control frequency properties
• 𝑏/
1
and 𝜙/
1
are channel-specific amplitudes and phase shifts
• Frequency properties are well-understood from wavelets, so we can
borrow that knowledge
26. Extracting Features
• Can extract the output of the convolutional
filters:
𝒉*1 = F
/
𝑥*/ ∗ 𝑓/
(1)
𝜏
Iℎ*1 = max(𝒉*1)
• This is a max pooling over the complete
time window
• Each convolutional filter bank is reduced to a
single output
• K distinct filter banks will convert an EEG
window into K features.
• Can view this as a nonlinear feature
extractor
27. Learning Treatment Biomarkers
• Want to learn neural dynamics that change post-treatment
• To evaluate this, we attempt to classify the treatment stage of the autologous
umbilical cord blood clinical trial (0 months/baseline, 6 months post-treatment,
and 12-months post-treatment) using EEG alone
• Proof-of-concept for applying to methodology to larger datasets and also learning
diagnostic classification and treatment efficacy biomarkers from EEG signals
• No controls yet
• Closed-label clinical trial (N=180) (placebo vs. treatment) just finished and will be analyzed
soon.
28. Convolutional Filter Visualization
Figure: One of the ten features learned in the neural network. (Left) This figure shows relative power in
arbitrary units (a.u.) defined by the learned variables in the network. (Middle) This figure shows the frequency
range used by the learned filter defined by the learned variables. (Right) To demonstrate the effect of the
learned feature, one can visualize its value from each data sample.
29. Does this learn better biomarkers?
• Evaluated by leave-one-
participant out cross-validation,
average over windows
• Predict one of (baseline, 6
months post-treatment, 12
months post-treatment)
• Most common analysis is based
on Power Spectral Density
features
Accuracy
Random Guessing 33.3
Dominant Class 41.1
Diff. Ent. + SVM 50.4
Power Spec.
Density+SVM
49.9
MC-DCNN* 58.4
SyncNet* 60.1
*Custom CNN-based approaches
30. Windowing data makes it seem much bigger
than it really is
• Standard practice is to predict over sub-segments of EEG data, which
makes N seem large (540 labeled “samples” per patient without
missing data)
• Only have 25 participants (22 with usable data)
31. “Little Big Data”
One primary goal was to
set up and address the
structure of data collection
Many repeats and labels
(e.g high N), especially
true when looking at
instantaneous EEG
Only a few participants
to represent the entire
population
propose us two problems:
Does the learned model overfit the data?
Can the learned model be generalized to new patient
EEG
EEG
EEG
EEG
EEG
Train
Test
Figure: Example of Domain Adaptation.
32. Biological and Medical
Measurements are
Heterogenous
Often measurements have more differences
between individuals than between
outcomes/labels.
On the right is a TSNE representation of a
behavioral test of a mouse model of ASD.
Separate mice are coded by a different color.
Each point is one observation from a single
mouse.
33. Do we learn participant-specific features?
• If we learn SyncNet to predict
class, learns participant-specific
patterns
• The confusion matrix on the right
takes the extracted features from
SyncNet and predicts participant
ID
• High accuracy despite not being
trained for this task
• Can we tell the network it
shouldn’t do this?
34. An Existing Approach: Domain Adversarial
Neural Networks
Nonlinear
Feature
Extraction
Logistic
Regression
Low-Dimensional
Extracted Feature
Label
Classifier Domain
While the network is being learned:
• One classifier to predict label
• One classifier to predict domain
• The features should work well to predict
the label but trick the domain classifier
Ganin et al, 2016
Removes participant-specific information in the feature
space!
35. Problem solved?
• Removes participant-specific
information
• Learned features much less predictive of
identity
• Makes the network perform worse
• The assumption requires that all
participants are the same in the feature
space
• Works well in images and text data
• Bad assumption in medical data—every
child is unique!
Accuracy
Random Guessing 33.3
Dominant Class 41.1
Diff. Ent. + SVM 50.4
Power Spec.
Density+SVM
49.9
MC-DCNN 58.4
SyncNet 60.1
SyncNet+DANN 58.7
36. A less stringent assumption
• Instead we want require:
• Every participants feature space is similar to a weighted superposition of the
other participants
• Succinctly, you need to be similar to at least one other person, but not
everyone!
• This can also be trained in an adversarial framework
• The network has the following properties:
• The label prediction tries to predict well on the labels/outcomes
• The domain classifier tries to predict which individual, but the loss is modified
to only penalize if you can differentiate between similar individuals
• The learned features try to do well on label prediction and trick the domain
classifier
Li et al, NeurIPS 2018, AISTATS 2019
37. Accomplished by Adversarial Learning
Celebrity Face Generation
Karras et al. 2018 Text-to-bird generation.
Zhang et. al. 2016
Adversarial learning is more commonly associated with generating fake images…
38. Learning Participant Relationships
subjects?
Similar subject should be similar in the feature
The feature should be good at predict treatm
Once we have these features, we can calculate dist
any two subjects.
Figure: The relationship learned by the propose
• Instead, we want to learn relationships between
participants
• Should have similar features to a few similar
participants
• As data size increases, we want to find cliques of
similar individuals
• Large practically important gains on the Autism
Center data
Li et. al., NeurIPS 2018, Li et. Al., AISTATS 2019
39. Does Multiple Domain Adaptation Help Us?
• Combining our previous interpretable neural
network approach (SyncNet) with our
Multiple Domain Matching Network (MDMN)
yields significant gains
• Statistically significant improvement (p=.002,
Wilcoxon Paired-Rank Test)
• This types of biomarkers can be used to
explore how the brain is changing post-
treatment
Accuracy
Random Guessing 33.3
Dominant Class 41.1
Diff. Ent. + SVM 50.4
Power Spec.
Density+SVM
49.9
MC-DCNN 58.4
SyncNet 60.1
SyncNet+DANN 58.7
SyncNet+MDMN 67.8
40. Universal Device Mapping
• A real issue in using neural
derived biomarkers is that
different research groups and
different hospitals use distinct
recording devices and electrode
layouts
• Can we learn a universal
mapping to a pseudo-input
space?
𝒑 𝒑∗
41. Gaussian Process Adapter
• A Gaussian process is a non-parametric
method where every finite set of points is
defined by a multivariate normal
• We define a set of pseudo-inputs 𝑝∗
• Can infer these the EEG signal at the
pseudo-input locations by using a
conditional Gaussian distribution with
𝑐𝑜𝑣 𝑝/, 𝑝/Q = 𝑘R 𝑝/, 𝑝/Q
• Really just a probabilistic interpolation
• Can learn the pseudo-input locations
• Can learn the covariance kernel parameters Li & Marlin, 2016
Li et al., 2017
𝒑 𝒑∗
SyncNet GP-SyncNet Joint
DEAP 0.52±.03 0.56±.02 0.60±.02
SEED 0.77±.01 0.76±.01 0.78±.01
Proof-of-concept on emotion recognition.
42. Moving to Large-Scale Trials
• A recently completed Stage 2 trial (DukeACT) of the same treatment
was just completed (N=180)
• Two groups in this study:
• Treatment at 0 months, placebo at 6 months
• Placebo at 0 months, treatment at 6 months
• Can ask multiple questions:
• What is the treatment effect?
• Can we do predictive medicine and predict treatment response?
• Analysis on studies and trials from the NIH
Autism Center of excellence in ASD+ADHD will
begin in the near future
43. Incorporating Behavior Information
• Large computer vision effort to
automatically derive behavioral features
(led by Guillermo Sapiro’s group)
• Autism and Beyond App
• Modified all future data collection to
simultaneously collect synchronized EEG +
video
• Can we combined EEG + automatically
derived features to create stronger
digital phenotypes
• ASD vs. Typically Developing:
• EEG alone: .75 AUC
• Behavior alone: .68 AUC
• EEG+Behavior: .88 AUC
Visualization of our features
extraction from our computer vision
system currently in use in data
collection.
Guillermo Sapiro
Isaev, under review.
Dmitry Isaev
45. Many Other Uses in Clinical Settings
• Neonatal hypoxic injury and seizures
• Can these types of algorithms improve early
detection and allow faster treatment?
Dmitry Tchapyjnikov, MD
Accuracy of diagnosing a seizure
based on clinical signs in infants
Number of infants who develop
seizures when undergoing
hypothermia therapy after
traumatic birth
50. Are we learning patterns of
neural control?
Can actually test in animal models.
51. Interpretable networks create testable
hypotheses
a -DIO-ChETA -EYFP
-50
0
50
edpotential(V)
Data Acquisition System
30kHz
Waveform
Generator
Real time
LFP Phase Analysis
60ms TTL
Signal
Blue
Yellow
Phase Triggered
Pulse
Thal
Stimulating
Fiber
Light Stimulus
IL LFP
Phase Trigger
Laser TTL
200ms
3-7Hz
b
40
50
60
edpotential(V)
**
neural Closed Loop Actuator for Synchronizing Phase (nCLASP) system
Thal
Optogenetic Validation of Learned Features
Hultman et al. Neuron 2016
52. Real-Time Mapping and Control
BLA
10
20
30
40
50
CeA
10
20
30
40
50
IL_Cx
10
20
30
40
50
PrL_Cx
10
20
30
40
50
NAc
10
20
30
40
50
VSub
10
20
30
40
50
VTA
10
20
30
40
50
BLA
10
20
30
40
50
CeA
10
20
30
40
50
IL_Cx
10
20
30
40
50
PrL_Cx
10
20
30
40
50
NAc
10
20
30
40
50
VSub
10
20
30
40
50
VTA
10
20
30
40
50
BLA
BLA
CeA
CeA
IL_Cx
IL_Cx
NAc
NAc
PrL_Cx
PrL_Cx
VSub
VSub
VTA
VTA
0
6
6
0
2 to 8Hz
Non-Stressed Resilient
HC FIT-Empty FIT-CD1 HC F
0
0.05
0.1
0.15
0.2
0.25
NetworkScore
Pre-Chronic Stress Post-C
0
0.02
0.04
0.06
0.08
0.1
0.12
BLA
BLA
CeA
CeA
IL_Cx
IL_Cx
NAc
NAc
PrL_Cx
PrL_Cx
VSub
VSub
VTA
VTA
0
6
6
0
Non-Stressed Resilient
BLA
10
20
30
40
50
CeA
10
20
30
40
50
IL_Cx
10
20
30
40
50
PrL_Cx
10
20
30
40
50
NAc
10
20
30
40
50
VSub
10
20
30
40
50
VTA
10
20
30
40
50
HC FIT-Empty FIT-CD1 HC F
NetworkScore
Pre-Chronic Stress Post-C
BLA
BLA
CeA
CeA
IL_Cx
IL_Cx
NAc
NAc
PrL_Cx
PrL_Cx
VSub
VSub
VTA
VTA
0
6
6
0
BLA
10
20
30
40
50
CeA
10
20
30
40
50
IL_Cx
10
20
30
40
50
PrL_Cx
10
20
30
40
50
NAc
10
20
30
40
50
VSub
10
20
30
40
50
VTA
10
20
30
40
50
8-20Hz
0
0.01
0.02
0.03
0.04
Non-Stressed Resilient
HC FIT-Empty FIT-CD1 HC F
NetworkScore
Pre-Chronic Stress Post-C
0.05
RegionsRegionsRegionsRegionsRegionsRegions
BLA
BLA
CeA
CeA
IL_Cx
IL_Cx
NAc
NAc
PrL_Cx
PrL_Cx
VSub
VSub
VTA
VTA
0
6
6
0
2-6Hz
0
0.24
0.48
0.72
0.96
Non-Stressed Resilient
HC FIT-Empty FIT-CD1 HC FI
NetworkScore
Pre-Chronic Stress Post-C
1.20
Network2
RegionsRegions
Network3Network4Network6
12 to 20Hz
10 15 20 25 30 35 40 45 50
0
0.3
0 5
0
0.3
10 15 20 25 30 35 40 45 500 5
0
0.3
10 15 20 25 30 35 40 45 500 5
0
0.3
10 15 20 25 30 35 40 45 500 5
VSub
VSub
VSub
VSub
Frequency
SpectralDensity
Frequency
SpectralDensity
Frequency
SpectralDensity
Frequency
SpectralDensity
!"
!#
!$
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
Yw1
Yw
Yw+1
Yw+2
BLA
Ce
IL
NAc
PrL
VSub
VTA
zw
sw1sw2...swL
yw
1yw
2
...yw
N
✓
W
Figure1:Left:Examplelocalfieldpotentialdatafromsevendistinctbrainregions,segmented
5secondtimewindows.Right:Graphicalmodelforthejointgenerative/discriminativedCSFA
where,foranylocationxintheinputspace,theprocessisdefinedbythemeanfunc
m(x)2RR
,whichisoftensettoequal0withoutlossofgenerality,andthecovariancef
tion(K(x,x0
;✓))r,r0=kr,r0
(x,x0
;✓),cov(fr(x),fr0(x0
)),whichdefineshowinputlocati
inregionrcovarieswithinputlocationx0
inregionr0
.
AnyfinitesetofobservationsY=[y1,...,yN]atinputlocationsx=[x1,...,xN]T
are
resentedbyamultivariatenormaldistribution,andtheparameters✓maybeoptimizedtofitt
observationsbymaximizingthemarginallikelihood
✓⇤
=argmax
✓
logp(Y|X,✓),p(Y|X,✓)=N
⇣
vec(YT
);0,K+⌘1
INR
⌘
,
wherevec(·)isacolumn-wisevectorizationofitsmatrix-valuedargumentandK2RNR⇥NR
i
Grammatrixdefinedbythecovariancekernelevaluatedatinputandoutputlocationsassociated
vec(YT
).Theformofthecovariancekernelconstrainsthetypesofposteriorfunctionsthatma
representedbytheGaussianprocess.Recently,expressivecovariancekernelshavebeenexpl
[31,32,28]thatarecapableofrepresentinganystationarykernelwhiletreating✓asexpres
featuresofinterestextractedfromthemodel.
2.2Thecross-spectralmixture(CSM)kernel
Themulti-regionelectrophysiologicalrecordingsarequasi-periodicsignalswithquasi-statio
cross-spectraldensities[28].Weassumethatwithinasingletask(i.e.,slidingwindowofdata)
observationsmayberepresentedbyastationaryGaussianprocess.Withinthistask,thedatacon
frequency-dependentpowerandphasesynchronybetweenbrainregions.Thiscross-couplin
believedtofacilitatecommunicationofinformationbetweenconnectedbrainregions.
Designedtocapturetheseexpressivefeatures,thecross-spectralmixture(CSM)kernel[28]isg
%&"
%&#
%&$
-50
0
50
0Hz)Evokedpotential(V)
Max
se(count)
Z = 256
P = 9x10-144
50
Data Acquisition System
30kHz
Waveform
Generator
Real time
LFP Phase Analysis
60ms TTL
Signal
Blue
Yellow
Phase Triggered
Pulse
Light Stimulus
IL LFP
Phase Trigger
Laser TTL
200ms
3-7Hz
b
20
30
40
50
60
0Hz)Evokedpotential(V)
Thal
**
neural Closed Loop Actuator for Synchronizing Phase (nCLASP) system
Can map in real time by approximating with
neural networks.
Bidirectional Neural Interface
Currently underway to evaluate treatment in an aggressive mouse model.
53. Conclusions
• Clinical trial data is an exciting route to develop and apply machine
learning tools
• Our focus on neural signals has strong evidence on a small cohort in
an early-stage clinical trial to relate neural measurements to clinically
relevant variables.
• Large-scale trials and observational studies are currently underway.
54. Acknowledgements & Funding
• PhD Students:
• Yitong Li
• William Carson
• Neil Gallagher
• Austin Talbot
• Laboratory for Psychiatric
Neuroengineering
• Kafui Dzirasa
• Rainbo Hultman
• Steven Mague
• Duke Center for Autism and Brain
Development
• Geri Dawson
• Michael Murias
• Sam Major
• National Institutes of Health
• W.M. Keck Foundation
• Stylli Foundation