Neurobiological Models and Research Themes

Neurobiological Models and
Research Themes
Matthew J. Crossley
Department of Psychological and Brain Sciences

University of California, Santa Barbara, 93106

I. A neurobiological model of appetitive instrumental
conditioning

II. Overview of my research

III. Contribution to the Ivry lab
Talk Goals

Why Instrumental Conditioning?
• The Ashby lab bread and butter is category
learning

• Information-Integration category-learning is a
procedural skill

• Appetitive Instrumental Conditioning is a
procedural skill

• Procedural Skills

• Model Architecture

• Instrumental Conditioning Applications

• Instrumental Conditioning Summary
Part I Outline




• Category Learning Applications

Outline

• Learned incrementally from feedback

• Model-free reinforcement learning

• Habitual control

• E.g., riding a bike or playing an instrument

• E.g., radiology
Procedural Skills

Procedural Skills
Where are the tumors?

Procedural Skills Depend on the
Basal Ganglia
• Basal ganglia are a
collection of subcortical
nuclei

• Interconnects with
cortex in well deﬁned
circuits

• Striatum is a major
input structure

GPi Inhibits the Thalamus
High baseline ﬁring
rate

Striatum Disinhibits the Thalamus

Procedural Learning Depends on the
Striatum
• Single-cell recordings

Carelli, Wolske, & West, 1997; Merchant, Zainos, Hernadez, Salinas, & Romo,
1997; Romo, Merchant, Ruiz, Crespo, & Zainos, 1995

• Lesion studies

Eacott & Gaffan, 1991; Gaffan & Eacott, 1995; Gaffan & Harrison, 1987;
McDonald & White, 1993, 1994; Packard, Hirsch, & White, 1989; Packard &
McGaugh, 1992

• Neuropsychological patient studies

Filoteo, Maddox, & Davis, 2001; Filoteo, Maddox, Salmon, & Song, 2005;
Knowlton, Mangels, & Squire, 1996

• Neuroimaging

Nomura et al., 2007; Seger & Cincotta, 2002; Waldschmidt & Ashby, 2011

Striatal Neurons
Medium Spiny

Projection Neurons (MSNs)

96%
GABA Interneurons

2%
TANs - Cholinergic Interneurons

2%

The TANs are of Particular Interest
• Tonically active and pause to excitatory input

• Presynaptically inhibit cortical input to MSNs

• Get major input from CM-Pf (thalamus)

• Learn to pause to stimuli that predict reward
(requires dopamine)




• Category Learning Applications

• Closing Remarks
Outline

Model Architecture
Ashby and Crossley (2011)

Learning Occurs at the CTX-MSN
Synapse and at Pf-TAN Synapses
Pf-TAN
Synapse
CTX-MSN
Synapse

Network Dynamics - Early Trial

Network Dynamics - Early Trial
SMA

Response and Feedback
• Model responds if SMA
crosses threshold

• Model is given feedback after
every trial

CTX-MSN Synaptic Modiﬁcation
Requires a TANs Pause
• Synaptic Strengthening:

- Strong presynaptic
activation

- Strong
postsynaptic
activation
- Elevated DA levels
• Synaptic Weakening:

activation

- Strong postsynaptic
activation
- Depressed DA levels
Arbuthnott, Ingham, & Wickens (2000)

Calabresi, Pisani, Mercuri, & Bernardi (1996)

Reynolds & Wickens (2002)

Synaptic Plasticity in the Striatum
Depends on Dopamine (DA)
• Synaptic Strengthening:

activation

activation

- Elevated DA levels
• Synaptic Weakening:

activation

activation

- Depressed DA levels
Arbuthnott, Ingham, & Wickens (2000)

Calabresi, Pisani, Mercuri, & Bernardi (1996)

Reynolds & Wickens (2002)

DA Encodes Reward Prediciton Error
(RPE)
• Elevated after unexpected
reward

• Depressed after unexpected
no-reward

• Does nothing if anything
expected happens
Bayer & Glimcher (2005)

Computing RPE
Obtained feedback on trial n:
Predicted feedback on trial n:
Rn =
1 if positive feedback
0 otherwise
Pn = Pn 1 + (Rn 1 Pn 1)
RPE on trial n:
RPE(n) = Rn Pn

DA Released on Trial n
DA(n) =
⌅⇤
⌅⇥
1 if RPE > 1
0.8RPE + 0.2 if 0.25 < RPE 1
0 if RPE < 0.25

Updating Synapses in the Model
!
wK,J (n +1) = wK,J (n)
+ "wIK (n) SJ (n) #$NMDA[ ]
+
D(n) # Dbase[ ]
+
1# wK,J (n)[ ]
# %wIK (n) SJ (n) #$NMDA[ ]
+
Dbase # D(n)[ ]
+
wK,J (n)
# &wIK (n) $NMDA # SJ (n)[ ]
+
' SJ (n) #$AMPA[ ]
+
wK,J (n).
Presynaptic Activity
Presynaptic Activity
Synaptic
Strengthening
Synaptic
Weakening

!
wK,J (n +1) = wK,J (n)
+
D(n) # Dbase[ ]
+
1# wK,J (n)[ ]
+
Dbase # D(n)[ ]
+
wK,J (n)
# &wIK (n) $NMDA # SJ (n)[ ]
+
' SJ (n) #$AMPA[ ]
+
wK,J (n).
Postsynaptic Activation
Postsynaptic Activation
Synaptic
Strengthening
Synaptic
Weakening

!
wK,J (n +1) = wK,J (n)
+
D(n) # Dbase[ ]
+
1# wK,J (n)[ ]
+
Dbase # D(n)[ ]
+
wK,J (n)
# &wIK (n) $NMDA # SJ (n)[ ]
+
' SJ (n) #$AMPA[ ]
+
wK,J (n).
Elevated DA
Depressed DA
Synaptic
Strengthening
Synaptic
Weakening

Network Dynamics - Late Trial
SMA

Model Accounts for Electrophysiological
Recordings from TANs

Model Accounts for Electrophysiological
Recordings from MSNs




Outline

Fast Reacquisition
Fast reacquisition is evidence that extinction
did not erase initial learning

Fast Reacquisition Mechanics
TANs quickly stop pausing, and thereby
protect cortico-striatal synapses

Partial Reinforcement Extinction (PRE)
Extinction is slower when acquisition
is trained with partial reinforcement

PRE Mechanics
TANs take longer to stop pausing
under partial reinforcement

Slowed Reacquisition
Condition
Phase
Ext2 Ext8 Prf2 Prf8
Acquisition VI-30 sec VI-30 sec VI-30 sec VI-30 sec
Extinction
No
Reinforcement
No
Reinforcement
Lean Schedule Lean Schedule
Reacquisition VI-2 min VI-8 min VI-2 min VI-8 min
Woods and Bouton (2007)

Behavioral Results
Crossley, Horvitz, Balsam, & Ashby (in prep)

Modeling Results

TANs don’t stop pausing during
extinction in Prf Conditions
CTX-MSN Synapse Pf-TAN Synapse

Renewal - Basic Design
Condition
Phase
ABA AAB ABC
Acquisition Environment A Environment A Environment A
Extinction Environment B Environment A Environment B
Renewal

(Extinction)
Environment A Environment B Environment C
Bouton et al. (2011)

Model Architecture

Synaptic Plasticity at ALL Pf-TAN
Synapses

Renewal

ABA Mechanics
Net Pf-TAN synaptic weight is the average of all
active Pf-TAN synapses

Instrumental Conditioning Summary
• The TANs protect learning at CTX-MSN synapses.

• Manipulations that keep the TANs paused during
extinction leave learning at the CTX-MSN synapse
subject to change.

I. A Neurobiological model of appetitive
instrumental conditioning

II. Overview of my research

III. Contribution to the Ivry Lab
Talk Goals

Category Learning:The Basics
A or B

Rule-Based Category Learning
Spatial Frequency
Orientation

Information-Integration Category
Learning
Spatial Frequency
Orientation

Many Qualitative Differences
Between RB and II
RB II
Unsupervised learning Yes No
Observational learning Yes No
Dual-task interference Yes No
Time needed to process
feedback
Yes No
Interference from button
switch
No Yes
Interference from Feedback
Delay
No Yes
II Category Learning is a Procedural Skill

Major Research Themes
• Unlearning

• System Interaction

• Miscellaneous

Unlearning Experiment Design
Crossley, Maddox & Ashby (under review)
Condition
Phase
Active Condition
Meta-Learning
Condition
Acquisition True Feedback True Feedback
Extinction Feedback Manipulation Feedback Manipulation
Reacquisition True Feedback
True Feedback

New Categories

We Achieved Unlearning
Unlearning requires partially-contingent feedback

Theoretical Account
Network architecture and new DA model
• DA is RPE scaled by response-feedback contingency

System Interaction Theme
• Development of TANs pause precedes
development of category-speciﬁc responses in
MSNs

• TANs should stop pausing during extinction (i.e.,
reward removal in instrumental conditioning and
noncontingent feedback in category learning).

• Phasic DA response should be scaled by response-
feedback contingency.
• Do systems cooperate to learn optimal behavior?

• What does it take to get system-switching?

• Does the procedural system learn during declarative
control?

• What mechanistic models describe system switching
throughout learning?

• What is the correct neurobiological model of
system switching?

Do Systems Cooperate?
Perfect accuracy is possible with trial-by-trial switching
between RB and II strategies
Ashby & Crossley (2010)
2 days (1200 trials) of training on:

Systems Compete
Information-Integration Uniform Hybrid Non-Uniform Hybrid
Guessing
Rule-Based
Information_integration
Hybrid
Decision-Bound Model Fit Summary
NumberofParticipants
05101520
Almost nobody was best ﬁt by a hybrid model
Ashby & Crossley (2010)

What does it take to get
successful system switching?
A
B
DC
Behavioral: Crossley, Roeder & Ashby (in prep)
fMRI:Turner, Crossley & Ashby (in prep)

Crossley, Roeder & Ashby (in prep)
Successful System-Switching
Training Protocol

• 100 RB trials

• 400 II trials

• 300 intermixed trials

• 100 button-switched
intermixed trials

Successful System-Switching
Button Switch
Crossley, Roeder & Ashby (in prep)
Persistent button-switch interference on II trials but not RB
trials supports true system switching
ButtonSwitchInterference

Does the procedural system learn during declarative
control?
Conditions

• Transfer Positive

• All Positive

• Transfer Negative

• All Negative
Crossley & Ashby (in prep)

Potential for weak bootstrapping
Small, but signiﬁcant hit in
Transfer Negative condition
during ﬁrst 50 trials after
transfer
TransferTrain
Crossley & Ashby (in prep)

System Interaction Theme
MSNs


• Do systems cooperate to learn optimal behavior?

• What does it take to get system-switching?

• Does the II system learn during RB control?

• What mechanistic models describe system switching
throughout learning?

• What is the correct neurobiological model of
system switching?

Explicitly Modeling System Switching
Turner, Crossley & Ashby (in prep)

Neurobiological Models of
System Interaction

Category Structure and
Feedback Effects
MSNs


• What system learns unstructured categories?

• Does probabilistic feedback induce procedural
learning?

The Experiment
Crossley, Madsen & Ashby (in prep)
Conditions

• Unstructured - Deterministic

• Unstructured - Probabilistic

• Rule-based - Deterministic

• Rule-based - Probabilistic

The Experiment

The Experiment

Accuracy

ReactionTime
Button-switch effect on unstructured categories suggests
procedural control

Learning Under a Dual-Task
MSNs


• Hypothesis 1: Dual-task induces procedural control.

• Hypothesis 2: Dual-task only slows the declarative
system down.
RB category learning with a simultaneous numerical Stroop task

The Experiment
Paul, Crossley & Ashby (in prep)
• Every participant does either RB or II structures with:

• Single-task, button-switch

• Dual-task, button-switch

The Experiment
Paul, Crossley & Ashby (in prep)

I. Lots of room to build spiking networks

Hand / Object Choice networks

Inhibitory Control and Competition Resolution

Supervised learning in the cerebellum

Model of timing in instrumental conditioning

II. Object choice, hand choice, and categorization:
Experiment ideas
Contribution to the Ivry Lab

Spiking Networks of Hand and Object Choice
Motivation

• Predictive clarity

• Model-based imaging

• Natural ability to account for
patient data

• Generate new experiments

Supervised Learning in the Cerebellum
Hypothesized hand and object choice brain systems
operate with different learning algorithms.
Doya, 2000

Spiking Networks of IC and CR
• Role of the hyperdirect
pathway?

• Relationship to our studies of
system switching?

I. Many of the tools used to dissociate RB and II
category learning systems might be used to
dissociate hand choice from object choice, and
subsystems thereof.

Feedback delay

Time duration to process feedback

Feedback contingency

Automaticity
Object choice, hand choice, and categorization experiment ideas

Acknowledgments
Collaborators:

Greg Ashby

The Ashby Lab

Todd Maddox

Jon Horvitz

Peter Balsam

!
Funding:

NIMH Grant MH3760-2,
Todd Wilkinson

Neurobiological Models and Research Themes

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (18)

Similar to Neurobiological Models and Research Themes

Similar to Neurobiological Models and Research Themes (20)

Recently uploaded

Recently uploaded (20)

Neurobiological Models and Research Themes