• Save
Presentation
Upcoming SlideShare
Loading in...5
×
 

Presentation

on

  • 215 views

Literature Colloquium

Literature Colloquium

Statistics

Views

Total Views
215
Slideshare-icon Views on SlideShare
215
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Presentation Presentation Presentation Transcript

    • Introduction Stochastic Hybrid Systems A learning Approach Discussion & Future Work A Learning Approach to Verification and Control of Stochastic Hybrid Systems Literature Colloquium Sofie Haesaert, DCSC Supervisors: dr. ir. A. Abate and prof. dr. R. Babuˇka s May 28, 2012 Honeywell.com1 / 25 Verification & Control of SHS
    • Introduction Stochastic Hybrid Systems A learning Approach Discussion & Future WorkOutline Introduction Air Traffic Safety and Control Applications Stochastic Hybrid Systems Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties Verification and Control A learning Approach Current Methods: Dynamic Programming Related Work Discussion & Future Work2 / 25 Verification & Control of SHS
    • Introduction Stochastic Hybrid Systems Air Traffic Safety and Control A learning Approach Applications Discussion & Future WorkAir Traffic Safety and Control Flight Safety Avoid: other airplanes, bad weather conditions, restricted airspace,... Reach end destination Analysis based on air traffic model Hybrid state space Stochastic due to wind and turbulence disturbance of flight path Safety → Probabilistic [J.Hu,2003]3 / 25 Verification & Control of SHS
    • Introduction Stochastic Hybrid Systems Air Traffic Safety and Control A learning Approach Applications Discussion & Future WorkIntroduction Other Applications Systems Biology DNA replication HIV Treatment Industrial Robotics Pick-and-place tasks ... [A. Singh,2010]4 / 25 Verification & Control of SHS
    • Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future WorkDiscrete-Time Stochastic Hybrid Systems Hybrid state space S = q∈Q {q} × Rn(q) Stochastic transitions Transition kernels in discrete time for Discrete transitions Tq Reset transition Tr Continuous transitions Tx Controlled / Autonomous Control of transitions, either continuous or finite action space Policy = string of controls ⇒ Lots of variations in definition of SHS e.g. initial states vs initial subsets. [A. Abate,2008]5 / 25 Verification & Control of SHS
    • Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future WorkReachability Analysis K s0 Determine if a given SHS will reach a certain target set K within a time horizon [0, N], starting from a set of initial states s0 . N can either be finite or infinite.6 / 25 Verification & Control of SHS
    • Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future WorkReach-Avoid Problem A K s0 Determine the probability (rs0 ) that given an initial state s0 the SHS will reach a certain target set K within a time horizon [0, N] while staying inside the safe set A.7 / 25 Verification & Control of SHS
    • Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future Work j = First hitting time of target set K Reach-Avoid trajectory: A K j ≤N State trajectory stays in safe set A s0 until j,   j−1 rs0 = Es0  1AK (si ) 1K (sj ) j∈[0,N] i=0 1, if sk ∈ K Indicator function 1K (sk ) = 0, otherwise [S. Summers,2010]8 / 25 Verification & Control of SHS
    • Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future Work j = First hitting time of target set K Reach-Avoid trajectory: A K j ≤N State trajectory stays in safe set A s0 until j,   j−1 rs0 = Es0  1AK (si ) 1K (sj ) j∈[0,N] i=0 1, if sk ∈ K Indicator function 1K (sk ) = 0, otherwise [S. Summers,2010]8 / 25 Verification & Control of SHS
    • Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future Work j = First hitting time of target set K Reach-Avoid trajectory: A K j ≤N State trajectory stays in safe set A s0 until j,   j−1 rs0 = Es0  1AK (si ) 1K (sj ) j∈[0,N] i=0 1, if sk ∈ K Indicator function 1K (sk ) = 0, otherwise [S. Summers,2010]8 / 25 Verification & Control of SHS
    • Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future Work j = First hitting time of target set K Reach-Avoid trajectory: A K j ≤N State trajectory stays in safe set A s0 until j,   j−1 rs0 = Es0  1AK (si ) 1K (sj ) j∈[0,N] i=0 1, if sk ∈ K Indicator function 1K (sk ) = 0, otherwise [S. Summers,2010]8 / 25 Verification & Control of SHS
    • Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future Work Verification: Find the probability associated to a reach-avoid problem   j−1 rs0 = Es0  1AK (si ) 1K (sj ) j∈[0,N] i=0 Control: Find a Policy π that maximizes rsπ 0   j−1 sup rsπ = sup Esπ  0 0 1AK (si ) 1K (sj ) π π j∈[0,N] i=0 [S. Summers,2010]9 / 25 Verification & Control of SHS
    • Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future WorkDynamic Programming (1/2) Define Value function Vk : S → [0, 1]   j−1 Vk (s) = Es  1AK (si ) 1K (sj ) j∈[k,N] i=k Then it follows that V0 (s0 ) = rs0 .10 / 25 Verification & Control of SHS
    • Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future WorkDynamic Programming (2/2) Verification: For k = 0, . . . N, iterate Vk (s) = 1K (s) + 1AK (s)Es [Vk+1 ] , ∀s ∈ S With VN (s) = 1K (s) ∀s ∈ S. Then V0 (s0 ) = rs0 . Control: Maximization at every iteration step ∗ Vk (s) = sup 1K (s) + 1AK (s)Esπ [Vk+1 ] , π ∀s ∈ S π11 / 25 Verification & Control of SHS
    • Introduction Discrete-time Stochastic Hybrid Systems Stochastic Hybrid Systems Stochastic Hybrid Systems: Properties A learning Approach Verification and Control Discussion & Future WorkComputational Issues Recursion often cannot be written out analytically → Approximation: Vk ∼ Vk ˆ Curse of Dimensionality Difference between exact solution and approximation Approximations of value function and/or policy include: Discrete approximation: Discretization of action-state space Functional approximation over action-state space12 / 25 Verification & Control of SHS
    • Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future WorkCurrent Methods: Dynamic Programming Discretization: Partitioning of state space Finite action space and hybrid state space ⇓ Markov Chain (= finite action-state process) ˆ Vk = Tabular form Figure: Discretization of Hybrid State Space (S) [A. Abate, 2007]13 / 25 Verification & Control of SHS
    • Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future Work + Error Bounds − Partitioning method − Curse of Dimensionality: bad scaling towards higher dimensions Goal Less conservative error bounds Optimal partitioning Action partitioning Functional approximation14 / 25 Verification & Control of SHS
    • Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future WorkCurrent Methods: Dynamic Programming Functional Approximation: SHS with finite action space h ˆ q Vk (s, θk ) = θi,k φi (x), s = (q, x) ∈ S i q q Parameter Vector θk = (θk 1 , . . . , θk m ) q q q for each discrete mode q: θk = (θ1,k , . . . , θh,k ) [A. Abate, 2008]15 / 25 Verification & Control of SHS
    • Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future Work Functional Approximation: − Only applied on Safety problems − Only applied on finite action spaces − No Error Bounds (yet) + Curse of Dimensionality: better scaling qualities16 / 25 Verification & Control of SHS
    • Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future WorkA Learning Approach: Related Work on Discounted ReturnProblems N Control objective: maxπ Jπ (x) = maxπ Esπ0 k k=0 γ rk With rk the reward at k and γ ∈ [0, 1) the discount factor. Model Free Samples (sk , a, sk+1 , rk ) Approximation methods for continuous state spaces Most methods for N → ∞ e.g. (Approximate) Q-learning, LSPI, actor-critic, ...17 / 25 Verification & Control of SHS
    • Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future WorkFitted Value Iteration (1/2) 1. Collect samples (sk , a, sk+1 , rk ) at M states: si , i = 1, . . . , M 2. Estimate value-function at M states : ˜ Vk (si ), i = 1, . . . , N 3. Fit value function to Vk ˜ ˆ ˜ Vk = fit(Vk ) 4. k ← k − 1, go to 2 [R. Munos, 2008]18 / 25 Verification & Control of SHS
    • Introduction Stochastic Hybrid Systems Current Methods: Dynamic Programming A learning Approach Related Work Discussion & Future WorkFitted Value Iteration (2/2) Finite action space & continuous state space Monte-Carlo approximations Probabilistic error bounds on the value functions ∼ descriptive power of approximation functions ∼ limited number of samples in Monte-Carlo approximations Extension/Variations available for Samples usage Action-value function : Q(s, a) Continuous states and actions19 / 25 Verification & Control of SHS
    • Introduction Stochastic Hybrid Systems A learning Approach Discussion & Future WorkPlan of Work 1. Control Synthesis: Fitted Value Iteration for SHS with Finite control Space Finite Horizon N Batch samples Kernel Based approximation 2. Finite Horizon Error Bounds 3. Infinite Horizon Error Bounds 4. Extensions: tree-based fitted Q-iteration, continuous action, Infinite Horizon20 / 25 Verification & Control of SHS
    • Introduction Stochastic Hybrid Systems A learning Approach Discussion & Future Work Other research lines: Functional approximation after discretization LSPI for N → ∞ ...21 / 25 Verification & Control of SHS
    • Introduction Stochastic Hybrid Systems A learning Approach Discussion & Future Work Thank you for your time Are there any questions?22 / 25 Verification & Control of SHS
    • Discounted Return vs Reach-avoid FVI: FormulasAppendix Slides Discounted Return vs Reach-avoid FVI: Formulas23 / 25 Verification & Control of SHS
    • Discounted Return vs Reach-avoid FVI: Formulas discounted return Vk (s) = rk + γEs [Vk+1 ] ∀s ∈ S reach-avoid Vk (s) = 1K (s) + 1AK (s)Es [Vk+1 ] ∀s ∈ S24 / 25 Verification & Control of SHS
    • Discounted Return vs Reach-avoid FVI: Formulas ˜ 1. Estimate value-function at M states Vk (si ), i = 1, . . . , N ˆ For a given function Vk−1 , and using samples the monte-carlo estimate V˜ of T (Vk−1 ) can be determined at the M base ˆ points as follows h 1 ˜ V (si ) = max rjsi ,a + γVk (sk+1 ) si ,a a∈A h j=1 ˜ 2. Fit value function to Vk M p ˆ Vk+1 = arg min ˜ f (si ) − V (si ) f ∈F i=125 / 25 Verification & Control of SHS