Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

308 views

Published on

Literature Colloquium

No Downloads

Total views

308

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

0

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Introduction Stochastic Hybrid Systems A learning Approach Discussion & Future Work A Learning Approach to Veriﬁcation and Control of Stochastic Hybrid Systems Literature Colloquium Soﬁe Haesaert, DCSC Supervisors: dr. ir. A. Abate and prof. dr. R. Babuˇka s May 28, 2012 Honeywell.com1 / 25 Veriﬁcation & Control of SHS
- 2. Introduction Stochastic Hybrid Systems A learning Approach Discussion & Future WorkOutline Introduction Air Traﬃc Safety and Control Applications Stochastic Hybrid Systems Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties Veriﬁcation and Control A learning Approach Current Methods: Dynamic Programming Related Work Discussion & Future Work2 / 25 Veriﬁcation & Control of SHS
- 3. Introduction Stochastic Hybrid Systems Air Traﬃc Safety and Control A learning Approach Applications Discussion & Future WorkAir Traﬃc Safety and Control Flight Safety Avoid: other airplanes, bad weather conditions, restricted airspace,... Reach end destination Analysis based on air traﬃc model Hybrid state space Stochastic due to wind and turbulence disturbance of ﬂight path Safety → Probabilistic [J.Hu,2003]3 / 25 Veriﬁcation & Control of SHS
- 4. Introduction Stochastic Hybrid Systems Air Traﬃc Safety and Control A learning Approach Applications Discussion & Future WorkIntroduction Other Applications Systems Biology DNA replication HIV Treatment Industrial Robotics Pick-and-place tasks ... [A. Singh,2010]4 / 25 Veriﬁcation & Control of SHS
- 5. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Veriﬁcation and Control Discussion & Future WorkDiscrete-Time Stochastic Hybrid Systems Hybrid state space S = q∈Q {q} × Rn(q) Stochastic transitions Transition kernels in discrete time for Discrete transitions Tq Reset transition Tr Continuous transitions Tx Controlled / Autonomous Control of transitions, either continuous or ﬁnite action space Policy = string of controls ⇒ Lots of variations in deﬁnition of SHS e.g. initial states vs initial subsets. [A. Abate,2008]5 / 25 Veriﬁcation & Control of SHS
- 6. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Veriﬁcation and Control Discussion & Future WorkReachability Analysis K s0 Determine if a given SHS will reach a certain target set K within a time horizon [0, N], starting from a set of initial states s0 . N can either be ﬁnite or inﬁnite.6 / 25 Veriﬁcation & Control of SHS
- 7. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Veriﬁcation and Control Discussion & Future WorkReach-Avoid Problem A K s0 Determine the probability (rs0 ) that given an initial state s0 the SHS will reach a certain target set K within a time horizon [0, N] while staying inside the safe set A.7 / 25 Veriﬁcation & Control of SHS
- 8. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Veriﬁcation and Control Discussion & Future Work j = First hitting time of target set K Reach-Avoid trajectory: A K j ≤N State trajectory stays in safe set A s0 until j, j−1 rs0 = Es0 1AK (si ) 1K (sj ) j∈[0,N] i=0 1, if sk ∈ K Indicator function 1K (sk ) = 0, otherwise [S. Summers,2010]8 / 25 Veriﬁcation & Control of SHS
- 9. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Veriﬁcation and Control Discussion & Future Work j = First hitting time of target set K Reach-Avoid trajectory: A K j ≤N State trajectory stays in safe set A s0 until j, j−1 rs0 = Es0 1AK (si ) 1K (sj ) j∈[0,N] i=0 1, if sk ∈ K Indicator function 1K (sk ) = 0, otherwise [S. Summers,2010]8 / 25 Veriﬁcation & Control of SHS
- 10. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Veriﬁcation and Control Discussion & Future Work j = First hitting time of target set K Reach-Avoid trajectory: A K j ≤N State trajectory stays in safe set A s0 until j, j−1 rs0 = Es0 1AK (si ) 1K (sj ) j∈[0,N] i=0 1, if sk ∈ K Indicator function 1K (sk ) = 0, otherwise [S. Summers,2010]8 / 25 Veriﬁcation & Control of SHS
- 11. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Veriﬁcation and Control Discussion & Future Work j = First hitting time of target set K Reach-Avoid trajectory: A K j ≤N State trajectory stays in safe set A s0 until j, j−1 rs0 = Es0 1AK (si ) 1K (sj ) j∈[0,N] i=0 1, if sk ∈ K Indicator function 1K (sk ) = 0, otherwise [S. Summers,2010]8 / 25 Veriﬁcation & Control of SHS
- 12. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Veriﬁcation and Control Discussion & Future Work Veriﬁcation: Find the probability associated to a reach-avoid problem j−1 rs0 = Es0 1AK (si ) 1K (sj ) j∈[0,N] i=0 Control: Find a Policy π that maximizes rsπ 0 j−1 sup rsπ = sup Esπ 0 0 1AK (si ) 1K (sj ) π π j∈[0,N] i=0 [S. Summers,2010]9 / 25 Veriﬁcation & Control of SHS
- 13. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Veriﬁcation and Control Discussion & Future WorkDynamic Programming (1/2) Deﬁne Value function Vk : S → [0, 1] j−1 Vk (s) = Es 1AK (si ) 1K (sj ) j∈[k,N] i=k Then it follows that V0 (s0 ) = rs0 .10 / 25 Veriﬁcation & Control of SHS
- 14. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Veriﬁcation and Control Discussion & Future WorkDynamic Programming (2/2) Veriﬁcation: For k = 0, . . . N, iterate Vk (s) = 1K (s) + 1AK (s)Es [Vk+1 ] , ∀s ∈ S With VN (s) = 1K (s) ∀s ∈ S. Then V0 (s0 ) = rs0 . Control: Maximization at every iteration step ∗ Vk (s) = sup 1K (s) + 1AK (s)Esπ [Vk+1 ] , π ∀s ∈ S π11 / 25 Veriﬁcation & Control of SHS
- 15. Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Veriﬁcation and Control Discussion & Future WorkComputational Issues Recursion often cannot be written out analytically → Approximation: Vk ∼ Vk ˆ Curse of Dimensionality Diﬀerence between exact solution and approximation Approximations of value function and/or policy include: Discrete approximation: Discretization of action-state space Functional approximation over action-state space12 / 25 Veriﬁcation & Control of SHS
- 16. Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future WorkCurrent Methods: Dynamic Programming Discretization: Partitioning of state space Finite action space and hybrid state space ⇓ Markov Chain (= ﬁnite action-state process) ˆ Vk = Tabular form Figure: Discretization of Hybrid State Space (S) [A. Abate, 2007]13 / 25 Veriﬁcation & Control of SHS
- 17. Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future Work + Error Bounds − Partitioning method − Curse of Dimensionality: bad scaling towards higher dimensions Goal Less conservative error bounds Optimal partitioning Action partitioning Functional approximation14 / 25 Veriﬁcation & Control of SHS
- 18. Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future WorkCurrent Methods: Dynamic Programming Functional Approximation: SHS with ﬁnite action space h ˆ q Vk (s, θk ) = θi,k φi (x), s = (q, x) ∈ S i q q Parameter Vector θk = (θk 1 , . . . , θk m ) q q q for each discrete mode q: θk = (θ1,k , . . . , θh,k ) [A. Abate, 2008]15 / 25 Veriﬁcation & Control of SHS
- 19. Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future Work Functional Approximation: − Only applied on Safety problems − Only applied on ﬁnite action spaces − No Error Bounds (yet) + Curse of Dimensionality: better scaling qualities16 / 25 Veriﬁcation & Control of SHS
- 20. Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future WorkA Learning Approach: Related Work on Discounted ReturnProblems N Control objective: maxπ Jπ (x) = maxπ Esπ0 k k=0 γ rk With rk the reward at k and γ ∈ [0, 1) the discount factor. Model Free Samples (sk , a, sk+1 , rk ) Approximation methods for continuous state spaces Most methods for N → ∞ e.g. (Approximate) Q-learning, LSPI, actor-critic, ...17 / 25 Veriﬁcation & Control of SHS
- 21. Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future WorkFitted Value Iteration (1/2) 1. Collect samples (sk , a, sk+1 , rk ) at M states: si , i = 1, . . . , M 2. Estimate value-function at M states : ˜ Vk (si ), i = 1, . . . , N 3. Fit value function to Vk ˜ ˆ ˜ Vk = ﬁt(Vk ) 4. k ← k − 1, go to 2 [R. Munos, 2008]18 / 25 Veriﬁcation & Control of SHS
- 22. Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future WorkFitted Value Iteration (2/2) Finite action space & continuous state space Monte-Carlo approximations Probabilistic error bounds on the value functions ∼ descriptive power of approximation functions ∼ limited number of samples in Monte-Carlo approximations Extension/Variations available for Samples usage Action-value function : Q(s, a) Continuous states and actions19 / 25 Veriﬁcation & Control of SHS
- 23. Introduction Stochastic Hybrid Systems A learning Approach Discussion & Future WorkPlan of Work 1. Control Synthesis: Fitted Value Iteration for SHS with Finite control Space Finite Horizon N Batch samples Kernel Based approximation 2. Finite Horizon Error Bounds 3. Inﬁnite Horizon Error Bounds 4. Extensions: tree-based ﬁtted Q-iteration, continuous action, Inﬁnite Horizon20 / 25 Veriﬁcation & Control of SHS
- 24. Introduction Stochastic Hybrid Systems A learning Approach Discussion & Future Work Other research lines: Functional approximation after discretization LSPI for N → ∞ ...21 / 25 Veriﬁcation & Control of SHS
- 25. Introduction Stochastic Hybrid Systems A learning Approach Discussion & Future Work Thank you for your time Are there any questions?22 / 25 Veriﬁcation & Control of SHS
- 26. Discounted Return vs Reach-avoid FVI: FormulasAppendix Slides Discounted Return vs Reach-avoid FVI: Formulas23 / 25 Veriﬁcation & Control of SHS
- 27. Discounted Return vs Reach-avoid FVI: Formulas discounted return Vk (s) = rk + γEs [Vk+1 ] ∀s ∈ S reach-avoid Vk (s) = 1K (s) + 1AK (s)Es [Vk+1 ] ∀s ∈ S24 / 25 Veriﬁcation & Control of SHS
- 28. Discounted Return vs Reach-avoid FVI: Formulas ˜ 1. Estimate value-function at M states Vk (si ), i = 1, . . . , N ˆ For a given function Vk−1 , and using samples the monte-carlo estimate V˜ of T (Vk−1 ) can be determined at the M base ˆ points as follows h 1 ˜ V (si ) = max rjsi ,a + γVk (sk+1 ) si ,a a∈A h j=1 ˜ 2. Fit value function to Vk M p ˆ Vk+1 = arg min ˜ f (si ) − V (si ) f ∈F i=125 / 25 Veriﬁcation & Control of SHS

No public clipboards found for this slide

Be the first to comment