Modeling decision making deficits in frontostriatal disorders using reinforcement learning

Modeling decision making deﬁcits in frontostriatal disorders
Michael Frank
Laboratory for Neural Computation and Cognition
Brown University

Computational Psychiatry and...
Neurogenocomputomics
• Many disorders broadly characterized by changes in motivation
• Several fronto-striatal disorders have substantial genetic heritability
• Individual differences in reinforcement learning?

• But... Candidate gene effects are generally small
• Which genes? Which task? Which measure?

• But... Candidate gene effects are generally small
• Which genes? Which task? Which measure?
• Need theoretical model! (and converging pharmacology/imaging)
Frank & Fossella, 2011; Maia & Frank, 2011; Huys et al, 2011

Reinforcement learning and dopamine: prediction errors
Positive PE: Negative PE:
dopamine:
Montague, Dayan & Sejnowksi 96; Doya, 2002; O’Reilly, Frank, Hazy & Watz 06...
ˆ ˆ
δ(t) = r(t) + γ V (t + 1) − V (t)

D1 effects on striatal learning: Positive PE

D1 effects on striatal learning: Positive PE
Three factor learning: presynaptic, postsynaptic and DA

D2 effects on striatal learning: Negative PE
Frank 2005

Neural model of basal ganglia and dopamine
Integrates a wide range of data into a single coherent framework
Separate Go and NoGo populations integrate statistics of reinforcement
preSMA
Input
Striatum γ [Vm− Θ]
cVm = gege[E Vm] y j ≈ γ [V − ] + 1
+
m Θ+
e
+ g g [E V ]
i i i m
+ g g [E Vm] β
l l l net = ge ≈ <x i w ij > +
N
STN + ...
w ij
GPe
xi
Go NoGo Thalamus
p p t t
∆wij ≈ (xi yj )−(xi yj )
SNc GPi/SNr
Frank, 2005, 2006 J Cog Neurosci, Neural Networks

Maximizing Reward via RT Adaptation:
Temporal Utility Integration Task
Reward Frequency Reward Magnitude
1.0 350
0.9 CEV CEV
DEV 300 DEV
0.8 IEV IEV
0.7 CEVR # Points Gained 250 CEVR
Probability

0.6 200
0.5
0.4 150
0.3 100
0.2
50
0.1
0.0 0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Time (ms) Time (ms)
Expected Value
60
Expected Value (freq*mag)
55
50
45
40
35
30
25
20 CEV
15 DEV
10 IEV
5 CEVR
0
0 1000 2000 3000 4000 5000
Time (ms)

RL model: Fit to data across all subjects
RL model : adjust RTs as a function of reward prediction errors
Frank, Doll, Oas-Terpstra & Moreno (2009, Nature Neuroscience)

Neurogenetic and pharmacological modulation of
reinforcement learning parameters
Frank & Fossella, 2011

Single subject Data...
Single Subject CEV Single Subject DEV
5000 5000
4500 4500
4000 4000
3500 3500
RT (ms)

RT (ms)
3000 3000
2500 2500
2000 2000
1500 1500
1000 1000
500 500
0 0
0 10 20 30 40 50 0 10 20 30 40 50
Trial Trial
Single Subject IEV Single Subject CEVR
5000 5000
4500 4500
4000 4000
3500 3500
RT (ms)

RT (ms)
3000 3000
2500 2500
2000 2000
1500 1500
1000 1000
500 500
0 0
0 10 20 30 40 50 0 10 20 30 40 50
Trial Trial

Exploration vs Exploitation
• By exploiting learned strategies, we know we can get a certain amount
of reward
• But don’t know how good it can get. ⇒ Need to Explore
• Theory: Explore based on relative uncertainty about whether other
actions might yield better outcomes than status quo (Dayan & Sejnowksi 96)

Uncertainty-Based Exploration
Exploration
4000
Model Exp term
3000 RT diff
2000
RT Diff (ms)

1000
0
−1000
−2000
−3000
Single Subject, CEV
−4000
5 10 15 20 25 30 35 40 45 50
Trial

PFC Gene-Dose Effect on Uncertainty-Based Exploration
COMT gene-dose effects
Uncertainty-exploration parameter
0.50
0.45 val/val
0.40 val/met
(x 1e4)

met/met
0.35
0.30
0.25
ε

0.20
0.15
0.10
0.05
0.00
Frank, Doll, Oas-Terpstra & Moreno (2009, Nature Neuroscience)

Does the brain track relative uncertainty for exploration?

Does the brain track relative uncertainty for exploration?
ǫ > 0 (’explorers’) explorers > non-explorers
Badre, Doll, Long & Frank, under review

EEG reveals temporal dynamics
Relative uncertainty represented prior to choice, and more so in exploratory trials
Cavanagh, Cohen, Figueroa & Frank, under review

Negative symptoms in schizophrenia:
Uncertainty-Based Exploration
Anhedonia & Exploration
Uncertainty-driven exploration
0.8
0.40
0.6
0.35 SZ
CN 0.4
ε (x 1e4)

0.30 0.2
0
ε (x1e4)
0.25
0.20 -0.2
** -0.4
0.15 -0.6
0.10 -0.8 r = -.44, p = .002
0.05 -1.0
0.00 -1.2
0 1 2 3 4
ε(uncert) Global Anhedonia
• Anhedonia = behavioral component of reward seeking (e.g., initiating
social/recreational activities) not capacity to experience pleasure
• Anhedonia related to exploration and not learning from reward prediction errors
Strauss et al, 2011, Biological Psychiatry

Obsessive Compulsive Disorder: Aversion to Uncertainty
Uncertainty-driven exploration
0.6
CN
0.4 OCD
ε (x 1e4)

0.2
0.0
-0.2
-0.4
gains losses
preliminary data, N=17 per group
with Mascha van ’t Wout, Ben Greenberg, Steve Rasmussen

Summary
• Dopamine modulates reinforcement learning and choice based on
positive and negative outcomes: patients, pharmacology, genetics,
imaging
• Prefrontal cortex tracks outcome uncertainty so as to reduce it
• Disruption of these mechanisms is associated with fronto-striatal
disorders, Parkinson’s, schizophrenia, OCD
• Models integrate between multiple levels of analysis:
neural mechanism to abstract computation (see Thomas Wiecki
demonstration tomorrow!).

Thanks To...
Bradley Doll
Christina Figueroa
Jim Cavanagh
David Badre
Jeff Cockburn
Anne Collins
Thomas Wiecki
Jim Gold
Kent Hutchison
Mascha van ’t Wout
Nicole Long
Mike Cohen
Ahmed Moustafa
Scott Sherman Lab for Neural Computation and Cognition
The patients

Modeling decision making deficits in frontostriatal disorders using reinforcement learning

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Similar to Modeling decision making deficits in frontostriatal disorders using reinforcement learning

Similar to Modeling decision making deficits in frontostriatal disorders using reinforcement learning (10)

Recently uploaded

Recently uploaded (20)

Modeling decision making deficits in frontostriatal disorders using reinforcement learning