Jennie sinsfadp06

215 views

Published on

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
215
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Cross-validation accuracy boxplots for both calibration and brain control data sets. Typically 20 runs of randomized 5 fold cross-validation were performed for each data set. The filled boxes are for Brain control. The non-filled ones are for Calibration. Each box shows the lower quartile, median, and upper quartile values of accuracy. Note that for R3, R5/1, R5/2, there are fewer than 30 trials in each brain control data set, thus the range of accuaracy is large.
  • Jennie sinsfadp06

    1. 1. Gradient Algorithms, Robustness, and Partial Observability- In the context of Cortical Neural Control using Rat Model Jennie Si Department of Electrical Engineering Arizona State University si@asu.edu NSF ADP 2006
    2. 2. Motivation/Challenge/Societal Impact• Introduce an interesting platform to study the higher function of the brain (the frontal cortical area and the motor area) in decision and control using designed control tasks• Use systems tools (ADP, MDP, CI…) to understand some fundamental science questions• Need to develop new tools: technology centered designs and theory centered analysis• Inspire new ways of thinking about complex systems si@asu.edu NSF ADP 2006
    3. 3. Background on cortical motor control• Center-out task and preferred direction• Population coding of movement direction and speed• Motor cortical neural activity as a predictive signal, preceding movement onset• Brain-machine interface: open loop vs. close loop solution si@asu.edu NSF ADP 2006
    4. 4. Cortical neural signal extraction: non-invasive vs. invasive recording• EEG – Rhythms β and μ, P300, Slow cortical potential (SCP) – Sampling rate 200-1000Hz, – # of channels, from 1 or 2 to 128 or 256• Electrodes – Bioactive, allowing growth of nerve, or bio-inactive multiple mircowires or multichannel electrode arrays – Superficial motor areas or deep brain structures – Primary motor, parietal, premotor, frontoparietal, basal ganglia si@asu.edu NSF ADP 2006
    5. 5. Cortical neural signal extraction: ECoGelectrodes for online control are circledspectral correlations of ECoG with targetlocation (color encodes patients) resting imagining saying the word ‘move’ (d) Imagery is associated with decrease in µ (8–12 Hz) and β (18–26 Hz) bands. A brain–computer interface using electrocorticographic signals in humans* Leuthardt et al 2004 J. Neural Eng. 1 63-71 si@asu.edu NSF ADP 2006
    6. 6. •Motor and Thalamic Regions •Used large number (40-60) of neurons •Regress the position of a water dripper arm •Used recurrent Neural NetworkChapin, J.K.; Moxon, K.A.; Markowitz, R.S.; and Nicolelis, M.A.L. (1999) Real-time control of arobot arm using simultaneously recorded neurons in the motor cortex. Nature Neurosci.,2:664-670. si@asu.edu NSF ADP 2006
    7. 7. a, b, Trial examples showing the movement by hand (green) and by neural reconstruction (blue) of a cursorto a target (red). Dotted outlines represent the actual circumference of the target and cursor on the screen.In a, hand motion resembles the neurally controlled cursor path; in b, no manipulandum motion occurred,but the neurally controlled cursor reached the target. Each dot represents an estimate of position, updatedat 50-ms intervals. Axes are in x, y screen coordinates (1,000 units corresponds to a visual angle of 3.5°);note that the two trials take place in different parts of the workspace. • SERRUYA, HATSOPOULOS, PANINSKI, FELLOWS & DONOGHUE. Instant neural control of a movement signal, NATURE 416 (6877): 141-142 MAR 14 2002 – Monkey, Utah array, motor cortex, – 2D cursor position and velocity, Linear and Kalman Filters, – a few (7–30) MI neurons – careful calibration can lead to reasonable control without excessive training si@asu.edu NSF ADP 2006
    8. 8. • Taylor, Dawn M., Tillery, Stephen I. Helms, Schwartz, Andrew B.,Direct Cortical Control of 3D Neuroprosthetic Devices, Science 2002 296: 1829-1832 – Monkey, microwire, motor and pre-motor cortex – 3D cursor velocity, adaptive version of Population Vectors – Showed small numbers of neurons can be used to control a three dimensional cursor and that neurons trained to control a cursor can control a real robot for feeding si@asu.edu NSF ADP 2006
    9. 9. • Carmena JM, Lebedev MA, Crist RE, et al., Learning to control a brain- machine interface for reaching and grasping by primates, PLOS BIOLOGY 1 (2): 193-208 NOV 2003 – Monkey, – high density array of 128 microwires, Motor, Premotor, Supplimentary Motor, Posterior Parietal, and Sensory Cortex – 2D cursor position and velocity and gripping force, Linear Filters si@asu.edu NSF ADP 2006
    10. 10. - Parietal reach region (PRR)- Cognition-based prosthetic goal rather than trajectory- Performance improved over a period of weeks.- Expected value signals related to fluid preference, the expected magnitude, or probability of reward were decoded simultaneously with the intended goal.Musallam, S., Corneil, B. D., Greger, B., Scherberger, H., and Andersen, R. A. (2004). "Cognitive Control Signals for Neural Prosthetics", Science, Vol 305, Issue 5681, 258-262 si@asu.edu NSF ADP 2006
    11. 11. Driving tasks• The arena for training rats to drive therobot towards one of the light si@asu.edu NSF ADP 2006
    12. 12. Question asked• How does the rat develop a control strategy to complete the driving tasks (under different time scale and spatial complexity)? si@asu.edu NSF ADP 2006
    13. 13. Neuroscientific evidence• Multimodal association area - anterior association area (prefrontal cortex) integrating different sensory modalities and linking them to action• Macaque and rat prefrontal cortex receives multimodal cortico-cortical projections from motor, somatosensory, visual, auditory, gustatory, and limbic cortices• Prefrontal areas provide cognitive, sensory or motivational inputs for motor behavior (rastral region in rat)• Motor areas are concerned with more concrete aspects of movement (caudal region in rat) si@asu.edu NSF ADP 2006
    14. 14. One step at a time…First, a directional control task with only high level control commands si@asu.edu NSF ADP 2006
    15. 15. The Brain-Controlled Vehicle Neural Interface Signal Processing Neural Signals Algorithms/Command ExtractionDirectiona l control Control Command Vehicle State Signal Environmental Feedback Vehicle Sensors si@asu.edu NSF ADP 2006
    16. 16. Goals• To decode the directional control decision as a predictive signal from motor cortical neural activities• To associate motor neural activities with motor behavior and thus to develop models to possibly interpret neural mechanism of cortical motor directional control si@asu.edu NSF ADP 2006
    17. 17. • male Sprague-Dawley rats• 2×4 arrays of 50µm tungsten wires coated with polyimide• spaced 500µm apart for a size of approximately 1.5mm×0.5mm.• The implant site targets the rostral region From Kolbe The Cerebral Cortex of the Rat, 1990 si@asu.edu NSF ADP 2006
    18. 18. Brain Control Diagram Feedback - Visual, Neural Signals Auditory & Reward TaskRecording Execution System NAV - K × L dimensional − 1, Left vector  + 1, Right Neuron 1 ··· Neuron L Bin 1 ... K Bin 1 ... K Computation Binned of Directional Spike times Data Neural Activity Vector Decision Control (NAV) Decision si@asu.edu NSF ADP 2006
    19. 19. Perievent Histograms Rdar36 Left Hits Right Hits sig001a sig005a sig001a sig005a 200 40 200 40 100 20 100 20 0 0 0 0 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 sig002a sig005b sig002a sig005b 60 120 30 40 20 80 20 20 10 40 10 0 0 0 0 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 sig003a sig006a sig003a sig006a 80 80 40 40 40 40counts/bin 0 0 0 0 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 sig003b sig007a sig003b sig007a 80 60 80 80 40 40 40 20 40 0 0 0 0 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 sig004a sig007b sig004a sig007b 120 150 8 80 4 100 4 40 50 0 0 0 0 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 sig004b sig008a sig004b sig008a 40 80 40 80 20 0 0 0 0 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 Time (sec) Time (sec) Time (sec) Time (sec) si@asu.edu NSF ADP 2006
    20. 20. Cross validation accuracy boxplots for manual and brain control respectively, 5 rats, 8 data sets C a c r c , C liba na dBa c n o a n uo s V c ua y a r tio n r in o tr l, ll e r n 1• Each box shows the 25-75 quartile, 0.9 median values of accuracy. 0.8• R3, R5/1, R5/2, V c ua y C Ac r c there are fewer than 0.7 30 trials in each brain control data 0.6 set. C lib2 /7 a 5 5 C libm d n a e ia 0.5 Ba 2 /7 r in 5 5 Ba m d n r in e ia 0.4 R1 R2 R3 R /1 4 R /2 4 R /3 4 R /1 5 R /2 5 R t/D y a a Typically 20 runs of randomized 5 fold cross- validation were performed for each data set. si@asu.edu NSF ADP 2006
    21. 21. Modeling rat’s directional control using MDP?MDPs: Finite state space S = {1,2,  , n} { Finite action space A i = a1i , a2i ,  , ami } Infinite decision horizon T = { 0,1,2,3, } Cost function c(i, a ) discount factor γ (0 < γ < 1) Action mapping a : S → Ai a(i ) ∈ A i Stationary controller policy π = (a, a, ) π ∈ Π s si@asu.edu NSF ADP 2006
    22. 22. Manual lever press following cueBrain control - “imaginary lever press” following cue si@asu.edu NSF ADP 2006
    23. 23. Possible implementationDefine 6 possible states:• Idle – between two trials• Ready – right before trial start• Reward – success of a trial• No-Reward – failure of a trial• Left experiment state – left cue experiment• Right experiment state – right cue experimentThe action (control) is the rat’s volition represented by corresponding neural activitiesGoing from one state to another depends on the current state as well as the action taken.• The reward can be stated as r (LL) = 1; r(LR)=-1 … r (RR) = 1; r(RL)=-1 … si@asu.edu NSF ADP 2006
    24. 24. Does this tell us more?• “Open loop” discrimination and CV analysis provide a baseline of relating neural activity (spike trains) to behavioral parameters (left/right decision)• As a decoding tool, can an MDP model tell us more than “open loop” analysis?• MDP model to explain the experiment as a decision process si@asu.edu NSF ADP 2006
    25. 25. Technicalities• How to represent control (start/stop and bin size)Trial and error, hard to formulate theoretically• How to compute the transition matrix given uncertainty, partially observed sequences of spike trainsWe can try to formulate this theoretically… si@asu.edu NSF ADP 2006
    26. 26. • Uncertain transition matrices – Robust value iteration (Nilim & El Ghaoui, 2005) – Robust policy iteration (Satia & Lave, 1973) si@asu.edu NSF ADP 2006
    27. 27. Problem formulation• Classification of uncertain transition matrices – Expression of uncertain transition matrices  P a11   f1a11 (U)   P a (1)   f1a (1) ( U )   1    π  1     M   M  P = M = M   a   a ji   P a( n )   f a( n ) ( U)  P =  Pi ji  =  fi (U)   n   n   M   M       P amn   f (U)  P = { P : U ∈ U } amn  n   1  si@asu.edu NSF ADP 2006
    28. 28. Problem formulation• Classification of uncertain transition matrices – Definition of uncertain transition matricesThe transition matrix P is correlated if y a a a P ⊂ P1 11 × × Pi ji × × P1 mn [The transition matrix P is independent if a a a I1 S1 S2 P = P1 11 × × Pi ji × × P1 mn aPi ji is the projection of P on the direction ] a jiof Pi (i ∈ S a ji ∈ A i )P π is the projection of P on the direction I2 [ ]of { P 1 a (1) ,P2 a (2) , , P n a( n) } x S1 = I1 × I 2 S 2 ⊂ I1 × I 2 si@asu.edu NSF ADP 2006
    29. 29. Problem formulation• Classification of MDPs – MDPs with independent transition matrices – MDPs with correlated transition matrices• Optimality criterion – Minimizing maximum value function for any initial state π min max vP (i ) = v* (i ) ∀i ∈ S π ∈Π s P∈P• Stationary optimal policy pair (π * , P * ) is optimal if π* π* π v (i ) = max v (i ) = min max v P (i ) for any initial state i ∈ S P* P P∈P π ∈Π s P∈P si@asu.edu NSF ADP 2006
    30. 30. Problem formulation• MDPs with independent transition matrices – An optimal policy pair exists – Robust value iteration and robust policy iteration are applicable• MDPs with correlated transition matrices – An optimal policy pair exists and both iterations are applicable – An optimal policy pair exists but both iterations are no longer applicable – An optimal policy pair does not exist si@asu.edu NSF ADP 2006
    31. 31. Questions to be answered• Sufficient conditions to guarantee that robust value iteration and robust policy iteration are applicable; • Optimality criterion to make a stationary optimal policy pair exist in a weak condition; • Efficient algorithm. si@asu.edu NSF ADP 2006
    32. 32. Sufficient conditionsLemma For any given π = (a, a,) ∈ Π s and any given q ∈ ℜ1×n , + n×1 v∈ℜ ( ) max qv : v (i ) ≤ g π (v) := c ( i, a(i ) ) + γ amax( i ) Pi a (i ) v i (i ) a Pi ∈Pi i∈S (1) For any given q ∈ℜ1×n , + max qv : v (i ) ≤ ( g (v) ) i := min  c ( i, a ) + γ max Pi a v    i∈S (2) v∈ℜn×1 a∈ A i  Pi a ∈Pi a  The functions g π and g are monotone non - decreasing and contractive. The problems (1) and (2) have the unique optimal solutions denoted as π v∞ and v∞ , which are the unique solutions to the fixed - point equations v = g π (v ) and v = g (v), respectively. The optimal transition probility rows are given by ( ) { } * π Pi a ( i ) ∈ arg amax( i ) Pi a ( i ) v∞ (i) a i ∈ S , which constitute ( Pπ )* (3) Pi ∈Pi ( ) { } * Pi a ∈ arg max Pi a v∞ i ∈ S , a ∈ A i , which constitute ( P)* a a (4) Pi ∈Pi si@asu.edu NSF ADP 2006
    33. 33. Sufficient conditions πIterations for obtaining v∞ π(1) select v0 ∈ℜn×1 and set k = 0;(2) compute vk +1 by vk +1 = g π (vk ) π π π π π π π(3) terminate if vk +1 = vk and output v∞ = vk ; otherwise, set k = k + 1 and go to (2)Iterations for obtaining v∞(1) select v0 ∈ ℜn×1 and set k = 0;(2) compute vk +1 by vk +1 = g (vk )(3) terminate if vk +1 = vk and output v∞ = vk ; otherwise, set k = k + 1 and go to (2) si@asu.edu NSF ADP 2006
    34. 34. Sufficient conditionsTheorem When there exist, for any π ∈ Π s , ( Pπ )* defined by (3) is in the set P π , and P* defined by (4) is in the set P i) A stationary optimal policy pair exists under the optimality criterion of minimizing maximum value function for any initial state ii) Robust value iteration is applicable; iii) Robust policy iteration is applicable. si@asu.edu NSF ADP 2006
    35. 35. Robust value iteration1. Select v0 ∈ℜn and set k = 0;2. Compute vk +1 by vk +1 (i ) = min  c(i, a ) + γ max Pi a vk    a∈ A i  Pi a ∈Pi a 3. If vk +1 = vk , then go to 4; otherwise increment k by 1 and go to 24. Compute π * = (a* , a* ,) and P* defined by a* (i ) ∈ arg min  c(i, a ) + γ max Pi a vk    a∈A i  Pi a ∈Pi a  ( ) a P* ∈ arg max{Pi a vk } i a a Pi ∈Pi5. If P* ∈ P, output a stationary optimal policy pair (π * , P* ); otherwise, the algorithm can not be applied. si@asu.edu NSF ADP 2006
    36. 36. Robust policy iteration1. Initialization : select π 0 = ( a0 , a0 ,) ∈ Π s and set k = 0; π2. Policy evaluation : do iteration for v∞k ;3. Policy improvement : find πk +1 = (ak +1 , ak +1 ,) ak +1 (i ) ∈ arg min  c(i, a ) + γ max Pi a v∞k   π  a∈ A i  Pi a ∈Pi a 4. If ππP = k , compute * by k +1 (P ) a π * ∈ arg max{Pi a v∞k } ∀i ∈ S a ∈ A i i a a Pi ∈Pi and go to 5; otherwise increment k by 1 and go to 2;5. If P* ∈ P, output a stationary optimal policy pair (π * , P* ); otherwise, the algorithm can not be applied. si@asu.edu NSF ADP 2006
    37. 37. Sufficient conditionsExample S = { 1, 2} A1 = A 2 = { a1 , a2 }  P a1   u1 1 1 − u1  c(1, a1 ) = 1  a2    P   u3 1 − u3  c(1, a2 ) = 2 P =  1a =  P2 1  1 − u2 2 u2  2 c(2, a1 ) = 3  a2    P  1− u   2   4 u4  c(2, a2 ) = 4 U = { u1 , u2 , u3 , u4 } W = { 0, 0.2, 0.4, 0.6, 0.8,1} U = { U : u1 = u3 , u2 = u4 ; u1 , u4 ∈ W} ⇒ Correlated transition matrix P Independent transition matrix for π , Pπ Optimal controller policy π * = a* , a* ,( ) a* (1) = a1 a* (2) = a1 0 1   0 1 Optimal nature policy P =  * ∈P 0 1   0 1 si@asu.edu NSF ADP 2006
    38. 38. New optimality criterion• Minimizing maximum squared total value function π 2 min max V P (5) π ∈Π s P∈P ′ Where total value function V π P = (V ) V π P π P ′ π ( π π V = v (1)  v (i )  v (n) P P P π P )• Stationary optimal policy pair (π ) 2 2 π* π* π 2 * * , P is optimal if V P* = max V P = min max V P P∈P π ∈Π s P∈P si@asu.edu NSF ADP 2006
    39. 39. New optimality criterion• Existence of stationary optimal policy pair Theorem : 2 Assuming for any π , max VPπ exists, a stationary optimal P∈P policy pair (π * , P* ) exists in terms of (5)• Relationship between two optimality criterions Optimality criterion of minimizing maximum squared total value function generalizes optimality criterion of minimizing maximum value function for any initial state si@asu.edu NSF ADP 2006
    40. 40. Robust policy iteration under total value function• Policy evaluation – Direct method −1 ′ ′ = max ( C )  I − γ ( P π )  π 2 ( I − γ ( P )) π π −1 max V P   Cπ P∈P P∈P   – Iterative method π Iteration for v∞ π * Π 3 Π 2 Π1 Π 0• Policy improvement – Policy improvement in robust policy iteration a k +1 (i ) ∈ arg min c(i, a ) + γ max Pi a vk    a∈A i  Pi a ∈Pi a  – Controller policy elimination π 2 πk 2 Necessary condition for optimal policy at k-th iteration V Pπ k ≤V Pπ k si@asu.edu NSF ADP 2006
    41. 41. 1. Initialization : set k = 0, Π 0 = Π s , M = +∞ and select π 0 = { a0 , a0 ,}2. Policy evaluation : If the condition of iteration for π k is satisfied 2 2 2 (a) use "iterative method" to compute Pπ k ∈ P and VPππkk such that VPππkk = max VPπ k P∈P Else (b) use "direct method"3. Policy improvement : (a) eliminate controller policies Algorithm of robust policy iteration under total value function { Π′ = π ∈ Π k : VPπ k k π 2 π ≤ VPπkk 2 } If Π ′ > 1 k If the condition in Theorem is satisfied 2 (b) Set Π k +1 = Π′ and M = VPπkk k π and select π k +1 = { a k +1 , ak +1 ,} ∈ Π k +1 by a∈A i { a k +1 (i ) ∈ arg min c(i, a) + γ max Pi a vk a a Pi ∈Pi } If π k +1 = π k , go to 4; otherwise, set k = k + 1 and go to 2 Else 2 2 (c) If VPππkk < M , set M = VPππkk and Π k +1 = Π ′, and then select π k +1 ≠ π k ∈ Π k +1 and set k = k + 1 and go to 2; otherwise, select π k ∈ Π′ − { π k } and set ′ k Π k = Π′ − { π k } and π k = π k and go to 2 k ′ Else (d) go to 4 si@asu.edu NSF ADP 20064. Termination : output (π k , Pπ k ) as a stationary optimal policy pair
    42. 42. Remaining issues toward MDP model of the rat’s neural control strategyHow to estimate uncertain stationary transition matrices in Markov decisionprocesses using the experimental data collected from the rat’s cortical motorareas while he performed his control tasks?Proposed Solution:D-S theory of evidence is proposed as new models for obtaining set estimation ofstationary transition matrixMathematics worked out, need to implement with algorithms and compare withexisting modelsIs a POMDP model more feasible? How?More work needed to give the rat’s cortical neural control mechanism areasonable mathematical model si@asu.edu NSF ADP 2006
    43. 43. Acknowledgement• Support by NSF under ECS-0002098 and ECS-0233529, and partially by General Dynamics• Support by ASU infrastructural funds• Byron Olson and Jing Hu for work on rat experiment and analysis• Baohua Li for robust dynamic programming results• Jiping He for help with experiments• Useful discussions with many (Dankert, L. Yang, C. Yang, Raghunathan …)• Lab support by many (Silver, Scanlan, Tian…) si@asu.edu NSF ADP 2006

    ×