Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Presentation

308 views

Published on

Literature Colloquium

Published in: Education, Business, Technology
  • Be the first to comment

Presentation

  1. 1. Introduction Stochastic Hybrid Systems A learning Approach Discussion & Future Work A Learning Approach to Verification and Control of Stochastic Hybrid Systems Literature Colloquium Sofie Haesaert, DCSC Supervisors: dr. ir. A. Abate and prof. dr. R. Babuˇka s May 28, 2012 Honeywell.com1 / 25 Verification & Control of SHS
  2. 2. Introduction Stochastic Hybrid Systems A learning Approach Discussion & Future WorkOutline Introduction Air Traffic Safety and Control Applications Stochastic Hybrid Systems Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties Verification and Control A learning Approach Current Methods: Dynamic Programming Related Work Discussion & Future Work2 / 25 Verification & Control of SHS
  3. 3. Introduction Stochastic Hybrid Systems Air Traffic Safety and Control A learning Approach Applications Discussion & Future WorkAir Traffic Safety and Control Flight Safety Avoid: other airplanes, bad weather conditions, restricted airspace,... Reach end destination Analysis based on air traffic model Hybrid state space Stochastic due to wind and turbulence disturbance of flight path Safety → Probabilistic [J.Hu,2003]3 / 25 Verification & Control of SHS
  4. 4. Introduction Stochastic Hybrid Systems Air Traffic Safety and Control A learning Approach Applications Discussion & Future WorkIntroduction Other Applications Systems Biology DNA replication HIV Treatment Industrial Robotics Pick-and-place tasks ... [A. Singh,2010]4 / 25 Verification & Control of SHS
  5. 5. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future WorkDiscrete-Time Stochastic Hybrid Systems Hybrid state space S = q∈Q {q} × Rn(q) Stochastic transitions Transition kernels in discrete time for Discrete transitions Tq Reset transition Tr Continuous transitions Tx Controlled / Autonomous Control of transitions, either continuous or finite action space Policy = string of controls ⇒ Lots of variations in definition of SHS e.g. initial states vs initial subsets. [A. Abate,2008]5 / 25 Verification & Control of SHS
  6. 6. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future WorkReachability Analysis K s0 Determine if a given SHS will reach a certain target set K within a time horizon [0, N], starting from a set of initial states s0 . N can either be finite or infinite.6 / 25 Verification & Control of SHS
  7. 7. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future WorkReach-Avoid Problem A K s0 Determine the probability (rs0 ) that given an initial state s0 the SHS will reach a certain target set K within a time horizon [0, N] while staying inside the safe set A.7 / 25 Verification & Control of SHS
  8. 8. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future Work j = First hitting time of target set K Reach-Avoid trajectory: A K j ≤N State trajectory stays in safe set A s0 until j,   j−1 rs0 = Es0  1AK (si ) 1K (sj ) j∈[0,N] i=0 1, if sk ∈ K Indicator function 1K (sk ) = 0, otherwise [S. Summers,2010]8 / 25 Verification & Control of SHS
  9. 9. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future Work j = First hitting time of target set K Reach-Avoid trajectory: A K j ≤N State trajectory stays in safe set A s0 until j,   j−1 rs0 = Es0  1AK (si ) 1K (sj ) j∈[0,N] i=0 1, if sk ∈ K Indicator function 1K (sk ) = 0, otherwise [S. Summers,2010]8 / 25 Verification & Control of SHS
  10. 10. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future Work j = First hitting time of target set K Reach-Avoid trajectory: A K j ≤N State trajectory stays in safe set A s0 until j,   j−1 rs0 = Es0  1AK (si ) 1K (sj ) j∈[0,N] i=0 1, if sk ∈ K Indicator function 1K (sk ) = 0, otherwise [S. Summers,2010]8 / 25 Verification & Control of SHS
  11. 11. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future Work j = First hitting time of target set K Reach-Avoid trajectory: A K j ≤N State trajectory stays in safe set A s0 until j,   j−1 rs0 = Es0  1AK (si ) 1K (sj ) j∈[0,N] i=0 1, if sk ∈ K Indicator function 1K (sk ) = 0, otherwise [S. Summers,2010]8 / 25 Verification & Control of SHS
  12. 12. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future Work Verification: Find the probability associated to a reach-avoid problem   j−1 rs0 = Es0  1AK (si ) 1K (sj ) j∈[0,N] i=0 Control: Find a Policy π that maximizes rsπ 0   j−1 sup rsπ = sup Esπ  0 0 1AK (si ) 1K (sj ) π π j∈[0,N] i=0 [S. Summers,2010]9 / 25 Verification & Control of SHS
  13. 13. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future WorkDynamic Programming (1/2) Define Value function Vk : S → [0, 1]   j−1 Vk (s) = Es  1AK (si ) 1K (sj ) j∈[k,N] i=k Then it follows that V0 (s0 ) = rs0 .10 / 25 Verification & Control of SHS
  14. 14. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future WorkDynamic Programming (2/2) Verification: For k = 0, . . . N, iterate Vk (s) = 1K (s) + 1AK (s)Es [Vk+1 ] , ∀s ∈ S With VN (s) = 1K (s) ∀s ∈ S. Then V0 (s0 ) = rs0 . Control: Maximization at every iteration step ∗ Vk (s) = sup 1K (s) + 1AK (s)Esπ [Vk+1 ] , π ∀s ∈ S π11 / 25 Verification & Control of SHS
  15. 15. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future WorkComputational Issues Recursion often cannot be written out analytically → Approximation: Vk ∼ Vk ˆ Curse of Dimensionality Difference between exact solution and approximation Approximations of value function and/or policy include: Discrete approximation: Discretization of action-state space Functional approximation over action-state space12 / 25 Verification & Control of SHS
  16. 16. Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future WorkCurrent Methods: Dynamic Programming Discretization: Partitioning of state space Finite action space and hybrid state space ⇓ Markov Chain (= finite action-state process) ˆ Vk = Tabular form Figure: Discretization of Hybrid State Space (S) [A. Abate, 2007]13 / 25 Verification & Control of SHS
  17. 17. Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future Work + Error Bounds − Partitioning method − Curse of Dimensionality: bad scaling towards higher dimensions Goal Less conservative error bounds Optimal partitioning Action partitioning Functional approximation14 / 25 Verification & Control of SHS
  18. 18. Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future WorkCurrent Methods: Dynamic Programming Functional Approximation: SHS with finite action space h ˆ q Vk (s, θk ) = θi,k φi (x), s = (q, x) ∈ S i q q Parameter Vector θk = (θk 1 , . . . , θk m ) q q q for each discrete mode q: θk = (θ1,k , . . . , θh,k ) [A. Abate, 2008]15 / 25 Verification & Control of SHS
  19. 19. Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future Work Functional Approximation: − Only applied on Safety problems − Only applied on finite action spaces − No Error Bounds (yet) + Curse of Dimensionality: better scaling qualities16 / 25 Verification & Control of SHS
  20. 20. Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future WorkA Learning Approach: Related Work on Discounted ReturnProblems N Control objective: maxπ Jπ (x) = maxπ Esπ0 k k=0 γ rk With rk the reward at k and γ ∈ [0, 1) the discount factor. Model Free Samples (sk , a, sk+1 , rk ) Approximation methods for continuous state spaces Most methods for N → ∞ e.g. (Approximate) Q-learning, LSPI, actor-critic, ...17 / 25 Verification & Control of SHS
  21. 21. Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future WorkFitted Value Iteration (1/2) 1. Collect samples (sk , a, sk+1 , rk ) at M states: si , i = 1, . . . , M 2. Estimate value-function at M states : ˜ Vk (si ), i = 1, . . . , N 3. Fit value function to Vk ˜ ˆ ˜ Vk = fit(Vk ) 4. k ← k − 1, go to 2 [R. Munos, 2008]18 / 25 Verification & Control of SHS
  22. 22. Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future WorkFitted Value Iteration (2/2) Finite action space & continuous state space Monte-Carlo approximations Probabilistic error bounds on the value functions ∼ descriptive power of approximation functions ∼ limited number of samples in Monte-Carlo approximations Extension/Variations available for Samples usage Action-value function : Q(s, a) Continuous states and actions19 / 25 Verification & Control of SHS
  23. 23. Introduction Stochastic Hybrid Systems A learning Approach Discussion & Future WorkPlan of Work 1. Control Synthesis: Fitted Value Iteration for SHS with Finite control Space Finite Horizon N Batch samples Kernel Based approximation 2. Finite Horizon Error Bounds 3. Infinite Horizon Error Bounds 4. Extensions: tree-based fitted Q-iteration, continuous action, Infinite Horizon20 / 25 Verification & Control of SHS
  24. 24. Introduction Stochastic Hybrid Systems A learning Approach Discussion & Future Work Other research lines: Functional approximation after discretization LSPI for N → ∞ ...21 / 25 Verification & Control of SHS
  25. 25. Introduction Stochastic Hybrid Systems A learning Approach Discussion & Future Work Thank you for your time Are there any questions?22 / 25 Verification & Control of SHS
  26. 26. Discounted Return vs Reach-avoid FVI: FormulasAppendix Slides Discounted Return vs Reach-avoid FVI: Formulas23 / 25 Verification & Control of SHS
  27. 27. Discounted Return vs Reach-avoid FVI: Formulas discounted return Vk (s) = rk + γEs [Vk+1 ] ∀s ∈ S reach-avoid Vk (s) = 1K (s) + 1AK (s)Es [Vk+1 ] ∀s ∈ S24 / 25 Verification & Control of SHS
  28. 28. Discounted Return vs Reach-avoid FVI: Formulas ˜ 1. Estimate value-function at M states Vk (si ), i = 1, . . . , N ˆ For a given function Vk−1 , and using samples the monte-carlo estimate V˜ of T (Vk−1 ) can be determined at the M base ˆ points as follows h 1 ˜ V (si ) = max rjsi ,a + γVk (sk+1 ) si ,a a∈A h j=1 ˜ 2. Fit value function to Vk M p ˆ Vk+1 = arg min ˜ f (si ) − V (si ) f ∈F i=125 / 25 Verification & Control of SHS

×