Real-time optimization of
neurophysiology experiments
Jeremy Lewi1
, Robert Butera1
, Liam Paninski2
1 Department of Bioen...
Neural Encoding
The neural code: what is P(response | stimulus)
Main Question: how to estimate P(r|x) from
(sparse) experi...
Curse of dimensionality
Both stimuli and responses can be very high-dimensional
Stimuli:
• Images
• Sounds
• Time -varying...
All experiments are not equally
informative
Possible p(r|x)
possible p(r|x)
after experiment A
Goal: Constrain set of poss...
Adaptive optimal design of
experiments
Assume:
• parametric model p(r|x,θ) of responses r on stimulus x
• prior distributi...
Theory: info. max is better
1. Info. max. is in general more efficient and never worse than
random sampling [Paninski 2005...
Computational challenges
∫
∫∫
−−
−−=
θ
θ θθ
θ
θθθ




),|(),|(
),|(
log),|(),|()|;(
11
11
tttt
tt
ttt
r
ttt
...
Solution Overview
1. Model responses using a 1-d GLM
• Computationally tractable
2. Approximate posterior as Gaussian
• ea...
Neural Model: GLM
tcoefficienhistoryspike
tcoefficienfilterstimulus
ttimeatratefiring
ilocationandttimeatstimulus
)exp()(
...
GLM
Computationally tractable
1. log likelihood is concave
2. log likelihood is 1-dimensional
)()exp(),|(log
)exp(
)(),|(
...
Updating the Posterior
1. Approximate the posterior, as Gaussian.
• Posterior is product of log concave functions
• Poster...
Updating the Posterior
3. Update is rank 1
4. Find the peak: Newton’s method in 1-d
5. Invert the Hessian: use the Woodbur...
Choosing the optimal stimulus
• Maximize the mutual information  Minimize the posterior entropy
• Posterior is Gaussian:
...
Running Time
1. Updating the
posterior
O(d2
)
d- dimensionality
2. Eigen
decomposition
O(d2
)
3. Choosing the
stimulus
O(d)
Simulation Setup
Compare: Random vs.
Information maximizing stimuli
Objective: learn parameters
A Gabor Receptive Field
• high dimensional
• Info. Max converges to true receptive field
• Converges faster than random
• ...
Non-stationary parameters
• Biological systems are non-stationary
– Degradation of the preparation
– Fatigue
– Attentive s...
Non-stationary parameters
• θi follow Gaussian curve whose center moves randomly
over time
• Assuming θ is constant overestimates certainty
 poor choices for optimal stimuli
Non-stationary parameters
Conclusions
1. Efficient implementation achievable with:
1. Model based approximations
• Model is specific but reasonable
...
References
1. A. Watson, et al., Perception and Psychophysics 33, 113 (1983).
2. M. Berry, et al., J. Neurosci. 18 2200(19...
Acknowledgements
This work was supported by the Department of Energy Computational
Science Graduate Fellowship Program of ...
Spike history
∑ ∑=
−+==
i
t
j
jtjtiitt
a
raxkrE
1
, )exp()(λ
posterior mean after 500 trials
stimulus filter spike history...
Previous Work
System Identification
1. Minimize variance of parameter estimate
• Deciding among a menu of experiments whic...
Derivation of Choosing the
Stimulus I
1
11
1
1
1|1
1111
11
)),((
]det[log
2
1
),|(
),|(),|(),,|;(
),,|;(maxarg
11
1
−
++
−...
Derivation of Choosing the
Stimulus II
tobsttttobst CJCCxrJC −≈+ −
++
− 1
11
1
)),((

)(][
]|log~]|log
tobstobsr
tobsrtob...
Maximization
∑∑∑=+
i
ii
i
ii
i
iit ycycyuxF 22
1 )
2
1
exp()exp()(

We maximize the above subject to a power constraint b...
Maximization II
We maximize the inner problem using lagrange multipliers:
)(2||||
)(maxarg
12
1
2
,||:|| 2
λ
λ
−
=⇒∑== i
i...
Posterior Update: Math
t
t
t
t
t
t
t
t
t
tt
r
x
r
x
r
x
CC
µθ
θ
=
−−−
−
−
−


















+=
...
Upcoming SlideShare
Loading in...5
×

lewi cns06.ppt

216

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
216
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Explain the experimental paradaigm
    Paninski – still things this slide is not very good. Thinks error bars might help
    Thinks you should be clear about what you want to say.
    Including spike history just means that in our likelihood we apply a linear filter to the spike history
    output of this linear filter is then summed with the output of the filter applied to the stimulus and this summed value is pushed through a nonlinearity to give the firing rite.
    2. Here we compare the two methods by looking at the accuracy of the estimates after 500 iterations.
    Want to make the point that we can’t control spike history terms unlike stimulus terms
    optimizing the stimuli for the stimulus coefficients is beneficial even if there are spike history terms
    We only maximized the information about the stimulus coefficients.
    2. Plots show mean
    Spike history terms allow you to model:
    refractoriness
    burstiness
    adaptation
    Do better at estimating the spike history coefficients as well- b\c we can better estimate the contribution due to the stimulus coefficients.
    an incorrect model (i.e ignoring spike history terms- in fitting the model) leads to inaccurate predictions
    -not surprising spike history terms are inhibitory- so we would underestimate the excitation, and overestimate the inhibitory stimulus coefficients and this is what we see
  • lewi cns06.ppt

    1. 1. Real-time optimization of neurophysiology experiments Jeremy Lewi1 , Robert Butera1 , Liam Paninski2 1 Department of Bioengineering, Georgia Institute of Technology, 2. Department of Statistics, Columbia University
    2. 2. Neural Encoding The neural code: what is P(response | stimulus) Main Question: how to estimate P(r|x) from (sparse) experimental data?
    3. 3. Curse of dimensionality Both stimuli and responses can be very high-dimensional Stimuli: • Images • Sounds • Time -varying behavior Responses: • observations from single or multiple simultaneously recorded point processes
    4. 4. All experiments are not equally informative Possible p(r|x) possible p(r|x) after experiment A Goal: Constrain set of possible systems as much as possible How: Maximize mutual information I({experiment};{possible systems}) Possible p(r|x) possible p(r|x) after experiment B
    5. 5. Adaptive optimal design of experiments Assume: • parametric model p(r|x,θ) of responses r on stimulus x • prior distribution p(θ) on finite-dimensional parameter space Goal: estimate θ from data Usual approach: draw stimuli i.i.d. from fixed p(x) Adaptive approach: choose p(x) on each trial to maximize I(θ;{r,x})
    6. 6. Theory: info. max is better 1. Info. max. is in general more efficient and never worse than random sampling [Paninski 2005] 2. Gaussian approximations are asymptotically accurate
    7. 7. Computational challenges ∫ ∫∫ −− −−= θ θ θθ θ θθθ     ),|(),|( ),|( log),|(),|()|;( 11 11 tttt tt ttt r ttt xrpp xrp xrppxrI rx rx 3. Computations need to be performed quickly: 10ms – 1 sec • Speed limits the number of trials 1. Updating the posterior: p(θ|x,r) • Difficult to represent/manipulation high dimensional posteriors 2. Maximizing the mutual information I(r;θ|x) • High dimensional integration • High dimensional optimization
    8. 8. Solution Overview 1. Model responses using a 1-d GLM • Computationally tractable 2. Approximate posterior as Gaussian • easy to work with even in high-d 3. Reduce optimization of mutual information to a 1-d problem
    9. 9. Neural Model: GLM tcoefficienhistoryspike tcoefficienfilterstimulus ttimeatratefiring ilocationandttimeatstimulus )exp()( , 1 , = = = = +== ∑ ∑= − j i t ti i t j jtjtiitt a k r x raxkrE a λ We model a neuron using a general linear model whose output is the expected firing rate. The nonlinear stage is the exponential function; also ensures the log likelihood is a concave function of θ.
    10. 10. GLM Computationally tractable 1. log likelihood is concave 2. log likelihood is 1-dimensional )()exp(),|(log )exp( )(),|( , θθθ λ λθ   tt i tii xrxxrp xk poissonxrp ⋅+−∝ = = ∑
    11. 11. Updating the Posterior 1. Approximate the posterior, as Gaussian. • Posterior is product of log concave functions • Posterior distribution is asymptotically Gaussian 2. Use a Laplace approximation to determine the parameters of the Gaussian, μt , Ct. • μt = peak of posterior • Ct – negative of the inverse hessian evaluated at the peak ),|( tt rxp  θ
    12. 12. Updating the Posterior 3. Update is rank 1 4. Find the peak: Newton’s method in 1-d 5. Invert the Hessian: use the Woodbury Lemma: O(d2 ) time + = log prior log likelihood log posterior
    13. 13. Choosing the optimal stimulus • Maximize the mutual information  Minimize the posterior entropy • Posterior is Gaussian: • Compute the expected determinant – Simplify using matrix perturbation theory • Result: Maximize an expression for the expected fisher information • Maximization Strategy – Impose a power constraint on the stimulus – Perform an eigendecomposition – Simplify using lagrange multipliers – Find solution by performing a 1-d numerical optimization • Bottleneck: Eigendecomposition – takes O(d2 ) in practice ||ln CEntropy ∝
    14. 14. Running Time 1. Updating the posterior O(d2 ) d- dimensionality 2. Eigen decomposition O(d2 ) 3. Choosing the stimulus O(d)
    15. 15. Simulation Setup Compare: Random vs. Information maximizing stimuli Objective: learn parameters
    16. 16. A Gabor Receptive Field • high dimensional • Info. Max converges to true receptive field • Converges faster than random • 25x33
    17. 17. Non-stationary parameters • Biological systems are non-stationary – Degradation of the preparation – Fatigue – Attentive state • Use a Kalman filter type approach • Model slow changes using diffusion ),0(1 QNtt +=+ θθ 
    18. 18. Non-stationary parameters • θi follow Gaussian curve whose center moves randomly over time
    19. 19. • Assuming θ is constant overestimates certainty  poor choices for optimal stimuli Non-stationary parameters
    20. 20. Conclusions 1. Efficient implementation achievable with: 1. Model based approximations • Model is specific but reasonable 2. Gaussian approximation of the posterior • Justified by the theory 3. Reduction of the optimization to a 1-d problem 2. Assumptions are weaker than typically required for system identification in high dimensions 3. Efficiency could permit system identification in previously intractable systems
    21. 21. References 1. A. Watson, et al., Perception and Psychophysics 33, 113 (1983). 2. M. Berry, et al., J. Neurosci. 18 2200(1998) 3. L. Paninski, Neural Computation 17, 1480 (2005). 4. P. McCullagh, et al., Generalized linear models (Chapman and Hall, London, 1989). 5. L. Paninski, Network: Computation in Neural Systems 15, 243 (2004). 6. E. Simoncelli, et al., The Cognitive Neurosciences, M. Gazzaniga, ed. (MIT Press, 2004), third edn. 7. M. Gu, et al., SIAM Journal on Matrix Analysis and Applications 15, 1266 (1994). 8. E. Chichilnisky, Network: Computation in Neural Systems 12, 199 (2001). 9. F. Theunissen, et al., Network: Computation in Neural Systems 12, 289 (2001). 10. L. Paninski, et al., Journal of Neuroscience 24, 8551 (2004)
    22. 22. Acknowledgements This work was supported by the Department of Energy Computational Science Graduate Fellowship Program of the Office of Science and National Nuclear Security Administration in the Department of Energy under contract DE-FG02-97ER25308 and by the NSF IGERT Program in Hybrid Neural Microsystems at Georgia Tech via grant number DGE- 0333411.
    23. 23. Spike history ∑ ∑= −+== i t j jtjtiitt a raxkrE 1 , )exp()(λ posterior mean after 500 trials stimulus filter spike history filter
    24. 24. Previous Work System Identification 1. Minimize variance of parameter estimate • Deciding among a menu of experiments which to conduct [Flaherty 05] 2. Maximize divergence of predicted responses for competing models [Dunlop06] Optimal Encoding 1. Maximize the mutual information input and output [Machens 02] 2. Maximize response • hill-climbing to find stimulus to which V1 neurons in monkey respond strongly [Foldiak01] • Efficient stimuli for cat auditory cortex [Nelken01] 3. Minimize stimulus reconstruction error [Edin04]
    25. 25. Derivation of Choosing the Stimulus I 1 11 1 1 1|1 1111 11 )),(( ]det[log 2 1 ),|( ),|(),|(),,|;( ),,|;(maxarg 11 1 − ++ − + ++ ++++ ++ += += −= ++ + ttobstt txrtt tttttttt tttt x xrJCC constCErxH rxHrxHrxxrI rxxrI tt t       θ θθθ θ We choose the stimulus by maximizing the conditional mutual information between the response and θ. Neglecting higher order terms, we just need to maximize:
    26. 26. Derivation of Choosing the Stimulus II tobsttttobst CJCCxrJC −≈+ − ++ − 1 11 1 )),((  )(][ ]|log~]|log tobstobsr tobsrtobsttr CJoCJtrE CJIECJCCE +−= −− So we just need to minimize Therefore we need to maximize 11111 ) 2 1 exp()exp()]([ ninformatiofisherExpected)( )]([][ +++++= = = tt t ttt t tt t t tobsr xCxxCxxJtrE J JtrECJtrE  µθ θ θ θ θ
    27. 27. Maximization ∑∑∑=+ i ii i ii i iit ycycyuxF 22 1 ) 2 1 exp()exp()(  We maximize the above subject to a power constraint by breaking it up into an inner and outer problem.       ⋅= ∑∑== + + ]) 2 1 exp(maxarg[)exp(maxarg)(maxarg 22 ,||:|| 1 21 i ii i ii buyeyyb t X ycycbXF t t   To maximize this expression, we express everything in terms of the eigenvectors of Ct.. are the projection of the mean and stimulus onto the eigenvectors. yu  , 111111 ) 2 1 exp()exp()()]([ ++++++ == tt t ttt t tt t tt xCxxCxxxFJtrE  µθθ
    28. 28. Maximization II We maximize the inner problem using lagrange multipliers: )(2|||| )(maxarg 12 1 2 ,||:|| 2 λ λ − =⇒∑== i i i i ii buyeyy c u y e yyc t  To find the global maximum we perform a 1-d search over λ1 , for each λ1 we compute F(y(λ1)) and then choose the stimulus which maximizes F(y(λ1))
    29. 29. Posterior Update: Math t t t t t t t t t tt r x r x r x CC µθ θ = −−− − − −                   +=     111 1 1 1 )exp( t t t t t t t t t tt t tt t t t t t t tt t t tt r x r x r r x C d xrpd r x r r x C t t                     +      −+−−=               +      −−−−= −−− − − −− − − − − − 111 1 1 11 1 1 1 )exp()( ),|(log )exp()()( 2 1 maxarg 1 1               θθµθ θ θ θθµθµθµ θ
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×