Neurobiological Models of Instrumental Conditioning

Neurobiological Models of
Instrumental Conditioning
Matthew J. Crossley
Department of Psychological and Brain Sciences

University of California, Santa Barbara, 93106

I. A neurobiological model of appetitive instrumental
conditioning

II. Applications of model

Fast Reacquisition

Partial Reinforcement Extinction

Renewal

III. Temporal-Difference model of DA
Outline

Why Instrumental Conditioning?
• The Ashby lab bread and butter is category
learning

• Information-Integration category-learning is a
procedural skill

• Appetitive Instrumental Conditioning is a
procedural skill

• Learned incrementally from feedback

• Model-free reinforcement learning

• Habitual control

• E.g., riding a bike or playing an instrument

• E.g., radiology
Procedural Skills

Procedural Skills
Where are the tumors?

Procedural Skills Depend on the
Basal Ganglia
• Basal ganglia are a
collection of subcortical
nuclei

• Interconnects with
cortex in well deﬁned
circuits

• Striatum is a major
input structure

GPi Inhibits the Thalamus
High baseline ﬁring
rate

Striatum Disinhibits the Thalamus

Procedural Learning Depends on the
Striatum
• Single-cell recordings

Carelli, Wolske, & West, 1997; Merchant, Zainos, Hernadez, Salinas, & Romo,
1997; Romo, Merchant, Ruiz, Crespo, & Zainos, 1995

• Lesion studies

Eacott & Gaffan, 1991; Gaffan & Eacott, 1995; Gaffan & Harrison, 1987;
McDonald & White, 1993, 1994; Packard, Hirsch, & White, 1989; Packard &
McGaugh, 1992

• Neuropsychological patient studies

Filoteo, Maddox, & Davis, 2001; Filoteo, Maddox, Salmon, & Song, 2005;
Knowlton, Mangels, & Squire, 1996

• Neuroimaging

Nomura et al., 2007; Seger & Cincotta, 2002; Waldschmidt & Ashby, 2011

Striatal Neurons
Medium Spiny

Projection Neurons (MSNs)

96%
GABA Interneurons

2%
TANs - Cholinergic Interneurons

2%

The TANs are of Particular Interest
• Tonically active and pause to excitatory input

• Presynaptically inhibit cortical input to MSNs

• Get major input from CM-Pf (thalamus)

• Learn to pause to stimuli that predict reward
(requires dopamine)

Model Architecture
Ashby and Crossley (2011)

Learning Occurs at the CTX-MSN
Synapse and at Pf-TAN Synapses
Pf-TAN
Synapse
CTX-MSN
Synapse

Network Dynamics - Early Trial

Network Dynamics - Early Trial
SMA

Response and Feedback
• Model responds if SMA
crosses threshold

• Model is given feedback after
every trial

CTX-MSN Synaptic Modiﬁcation
Requires a TANs Pause
• Synaptic Strengthening:

- Strong presynaptic
activation

- Strong
postsynaptic
activation
- Elevated DA levels
• Synaptic Weakening:

activation

- Strong postsynaptic
activation
- Depressed DA levels
Arbuthnott, Ingham, & Wickens (2000)

Calabresi, Pisani, Mercuri, & Bernardi (1996)

Reynolds & Wickens (2002)

Synaptic Plasticity in the Striatum
Depends on Dopamine (DA)
• Synaptic Strengthening:

activation

activation

- Elevated DA levels
• Synaptic Weakening:

activation

activation

- Depressed DA levels
Arbuthnott, Ingham, & Wickens (2000)

Calabresi, Pisani, Mercuri, & Bernardi (1996)

Reynolds & Wickens (2002)

DA Encodes Reward Prediciton Error
(RPE)
• Elevated after unexpected
reward

• Depressed after unexpected
no-reward

• Does nothing if anything
expected happens
Bayer & Glimcher (2005)

Computing RPE
Obtained feedback on trial n:
Predicted feedback on trial n:
Rn =
1 if positive feedback
0 otherwise
Pn = Pn 1 + (Rn 1 Pn 1)
RPE on trial n:
RPE(n) = Rn Pn

DA Released on Trial n
DA(n) =
⌅⇤
⌅⇥
1 if RPE > 1
0.8RPE + 0.2 if 0.25 < RPE 1
0 if RPE < 0.25

Updating Synapses in the Model
!
wK,J (n +1) = wK,J (n)
+ "wIK (n) SJ (n) #$NMDA[ ]
+
D(n) # Dbase[ ]
+
1# wK,J (n)[ ]
# %wIK (n) SJ (n) #$NMDA[ ]
+
Dbase # D(n)[ ]
+
wK,J (n)
# &wIK (n) $NMDA # SJ (n)[ ]
+
' SJ (n) #$AMPA[ ]
+
wK,J (n).
Presynaptic Activity
Presynaptic Activity
Synaptic
Strengthening
Synaptic
Weakening

!
wK,J (n +1) = wK,J (n)
+
D(n) # Dbase[ ]
+
1# wK,J (n)[ ]
+
Dbase # D(n)[ ]
+
wK,J (n)
# &wIK (n) $NMDA # SJ (n)[ ]
+
' SJ (n) #$AMPA[ ]
+
wK,J (n).
Postsynaptic Activation
Postsynaptic Activation
Synaptic
Strengthening
Synaptic
Weakening

!
wK,J (n +1) = wK,J (n)
+
D(n) # Dbase[ ]
+
1# wK,J (n)[ ]
+
Dbase # D(n)[ ]
+
wK,J (n)
# &wIK (n) $NMDA # SJ (n)[ ]
+
' SJ (n) #$AMPA[ ]
+
wK,J (n).
Elevated DA
Depressed DA
Synaptic
Strengthening
Synaptic
Weakening

Network Dynamics - Late Trial
SMA

Model Accounts for Electrophysiological
Recordings from TANs

Model Accounts for Electrophysiological
Recordings from MSNs

Fast Reacquisition
Fast reacquisition is evidence that extinction
did not erase initial learning

Fast Reacquisition Mechanics
TANs quickly stop pausing, and thereby
protect cortico-striatal synapses

Partial Reinforcement Extinction (PRE)
Extinction is slower when acquisition
is trained with partial reinforcement

PRE Mechanics
TANs take longer to stop pausing
under partial reinforcement

Slowed Reacquisition
Condition
Phase
Ext2 Ext8 Prf2 Prf8
Acquisition VI-30 sec VI-30 sec VI-30 sec VI-30 sec
Extinction
No
Reinforcement
No
Reinforcement
Lean Schedule Lean Schedule
Reacquisition VI-2 min VI-8 min VI-2 min VI-8 min
Woods and Bouton (2007)

Behavioral Results
Crossley, Horvitz, Balsam, & Ashby (in prep)

Modeling Results

TANs don’t stop pausing during
extinction in Prf Conditions
CTX-MSN Synapse Pf-TAN Synapse

Renewal - Basic Design
Condition
Phase
ABA AAB ABC
Acquisition Environment A Environment A Environment A
Extinction Environment B Environment A Environment B
Renewal

(Extinction)
Environment A Environment B Environment C
Bouton et al. (2011)

Model Architecture

Synaptic Plasticity at ALL Pf-TAN
Synapses

Renewal

ABA Mechanics
Net Pf-TAN synaptic weight is the average of all
active Pf-TAN synapses

Instrumental Conditioning Summary
• The TANs protect learning at CTX-MSN synapses.

• Manipulations that keep the TANs paused during
extinction leave learning at the CTX-MSN synapse
subject to change.

Untested Physiological
Predictions
• Development of TANs pause precedes
development of category-speciﬁc responses in
MSNs

• TANs should stop pausing during extinction

I. A neurobiological model of appetitive instrumental
conditioning

II. Applications of model

Fast Reacquisition

Partial Reinforcement Extinction

Renewal

III. Temporal-Difference (TD) model of DA
Outline

Putting TD into the model
We want to replace the
discrete-trial model of
DA with a continuous
time model

The TD Prediction Error
Trial
Time Step
Prediction
Error

The TD Prediction Error
⇥t = rt + V (t + 1) V (t)
rt =
1 if reward at time t
0 if no reward at time t
Montague, Dayan, Sejnowski (1996) journal of neuroscience 16(5): 1936-1947

Model Architecture
Spiking Neuron Driven by
TD prediction error:
TANs were removed for
initial TD applications
⇥t = rt + V (t + 1) V (t)

We Need Modiﬁed Learning
Equations
!
wK,J (n +1) = wK,J (n)
+
D(n) # Dbase[ ]
+
1# wK,J (n)[ ]
+
Dbase # D(n)[ ]
+
wK,J (n)
# &wIK (n) $NMDA # SJ (n)[ ]
+
' SJ (n) #$AMPA[ ]
+
wK,J (n).
Synaptic
Strengthening
Synaptic
Weakening
DA is no longer modeled on a
discrete trial-by-trial basis!

CaMKII, PP-1 and Striatal Plasticity

Learning Equations
w(n + 1) = w(n)
+ w [SCaMKII(t) SCaMKII base]+
[DPP-1(t) Dbase]+
[wmax w(n)]dt
⇥w [SCaMKII(t) SCaMKII base]+
[Dbase DPP-1(t)]+
w(n)dt
Synaptic
Strengthening
Synaptic
Weakening
CaMKII Activity
CaMKII Activity

Learning Equations
w(n + 1) = w(n)
+ w [SCaMKII(t) SCaMKII base]+
[DPP-1(t) Dbase]+
[wmax w(n)]dt
⇥w [SCaMKII(t) SCaMKII base]+
[Dbase DPP-1(t)]+
w(n)dt
Synaptic
Strengthening
Synaptic
Weakening
PP-1 Activity
PP-1 Activity

Acquisition and Extinction
Trial
ProportionResponsesEmitted
Trial
CTX-MSNSynapticStrength

MSN and SNc
Trial
Time Step
TrialTime Step
MSNOutputSNcOutput

CaMKII and PP-1
DA model learns very quickly that
reward is taken away
Trial
TimeStep
Trial
TimeStep

Extinction under noncontingent
reward delivery
Trial
ProportionResponsesEmitted
Trial
CTX-MSNSynapticStrength

MSN and SNc
TrialTime Step
MSNOutput
Trial
Time Step
SNcOutput

MSN and SNc
Noncontingent reward delivery
keeps DA surprised
Trial
TimeStep
Trial
TimeStep

CaMKII and PP-1
Noncontingent reward delivery
keeps DA surprised
Trial
TimeStep
Trial
TimeStep

Summary and Future Directions
• TANs need to be added to account for
reacquisition, renewal, and other effects after
extinction with noncontingent reward

• TD model might need to be modiﬁed once the
TANs are included and post-extinction effects are
examined

Acknowledgments
Collaborators:

Greg Ashby

The Ashby Lab

Todd Maddox

Jon Horvitz

Peter Balsam

!
Funding:

NIMH Grant MH3760-2,
Todd Wilkinson

Neurobiological Models of Instrumental Conditioning

Recommended

Recommended

More Related Content

Similar to Neurobiological Models of Instrumental Conditioning

Similar to Neurobiological Models of Instrumental Conditioning (20)

Recently uploaded

Recently uploaded (20)

Neurobiological Models of Instrumental Conditioning