Neurobiological Models of Instrumental Conditioning
1. Neurobiological Models of
Instrumental Conditioning
Matthew J. Crossley
Department of Psychological and Brain Sciences
University of California, Santa Barbara, 93106
2. I. A neurobiological model of appetitive instrumental
conditioning
II. Applications of model
Fast Reacquisition
Partial Reinforcement Extinction
Renewal
III. Temporal-Difference model of DA
Outline
3. Why Instrumental Conditioning?
• The Ashby lab bread and butter is category
learning
• Information-Integration category-learning is a
procedural skill
• Appetitive Instrumental Conditioning is a
procedural skill
4. • Learned incrementally from feedback
• Model-free reinforcement learning
• Habitual control
• E.g., riding a bike or playing an instrument
• E.g., radiology
Procedural Skills
7. Procedural Skills Depend on the
Basal Ganglia
• Basal ganglia are a
collection of subcortical
nuclei
• Interconnects with
cortex in well defined
circuits
• Striatum is a major
input structure
16. The TANs are of Particular Interest
• Tonically active and pause to excitatory input
• Presynaptically inhibit cortical input to MSNs
• Get major input from CM-Pf (thalamus)
• Learn to pause to stimuli that predict reward
(requires dopamine)
17. I. A neurobiological model of appetitive instrumental
conditioning
II. Applications of model
Fast Reacquisition
Partial Reinforcement Extinction
Renewal
III. Temporal-Difference model of DA
Outline
40. Model Accounts for Electrophysiological
Recordings from TANs
Ashby and Crossley (2011)
41. Model Accounts for Electrophysiological
Recordings from MSNs
Ashby and Crossley (2011)
42. I. A neurobiological model of appetitive instrumental
conditioning
II. Applications of model
Fast Reacquisition
Partial Reinforcement Extinction
Renewal
III. Temporal-Difference model of DA
Outline
43. Fast Reacquisition
Ashby and Crossley (2011)
Fast reacquisition is evidence that extinction
did not erase initial learning
51. TANs don’t stop pausing during
extinction in Prf Conditions
CTX-MSN Synapse Pf-TAN Synapse
52. Renewal - Basic Design
Condition
Phase
ABA AAB ABC
Acquisition Environment A Environment A Environment A
Extinction Environment B Environment A Environment B
Renewal
(Extinction)
Environment A Environment B Environment C
Bouton et al. (2011)
57. ABA Mechanics
Crossley, Horvitz, Balsam, & Ashby (in prep)
Net Pf-TAN synaptic weight is the average of all
active Pf-TAN synapses
58. Instrumental Conditioning Summary
• The TANs protect learning at CTX-MSN synapses.
• Manipulations that keep the TANs paused during
extinction leave learning at the CTX-MSN synapse
subject to change.
60. I. A neurobiological model of appetitive instrumental
conditioning
II. Applications of model
Fast Reacquisition
Partial Reinforcement Extinction
Renewal
III. Temporal-Difference (TD) model of DA
Outline
61. Putting TD into the model
We want to replace the
discrete-trial model of
DA with a continuous
time model
63. The TD Prediction Error
⇥t = rt + V (t + 1) V (t)
rt =
1 if reward at time t
0 if no reward at time t
Montague, Dayan, Sejnowski (1996) journal of neuroscience 16(5): 1936-1947
64. Model Architecture
Spiking Neuron Driven by
TD prediction error:
TANs were removed for
initial TD applications
⇥t = rt + V (t + 1) V (t)
77. Summary and Future Directions
• TANs need to be added to account for
reacquisition, renewal, and other effects after
extinction with noncontingent reward
• TD model might need to be modified once the
TANs are included and post-extinction effects are
examined